[HN Gopher] Prompt engineering vs. blind prompting ___________________________________________________________________ Prompt engineering vs. blind prompting Author : Anon84 Score : 216 points Date : 2023-04-22 16:44 UTC (6 hours ago) (HTM) web link (mitchellh.com) (TXT) w3m dump (mitchellh.com) | popcorncowboy wrote: | If you don't apply a rigorous evaluation and testing framework | you are a Prompt Alchemist at best. | jw1224 wrote: | I spent 3 hours today carefully "engineering" a complex prompt. I | ran it on a loop, then fed the results back in to my database. | | The 3 hours I spent has saved my business ~$15k in costs. | rideontime wrote: | How? | woutr_be wrote: | Do you mind explaining this more? How did your prompt save your | business $15K in costs? | azubinski wrote: | Prompt engineering is a kind of prompting that is (in a sense) a | kind of an engineering, but it's impossible to understand in what | sense it is a kind of what engineering, that's why it is so hard | to understand _coherent texts_ if they are not completely | meaningful. | | All this is so exciting and it promises many new jobs that | require accelerated education of prompt engineers. | mcs_ wrote: | Prompt's engineer only exists if they can measure the efficiency | of their inputs. | | How to measure the effectiveness of a given prompt seems to me a | big deal now. | tgcandido wrote: | does your approach takes into consideration that the temperature | parameter is equal to 0? | majormajor wrote: | There's also no instances of "sample" or "random" in the | article. | | If you say "can be developed based on real experimental | methodologies" but don't talk about randomness or temp (or | top_p, though personally I haven't played with that one as | much) then I'm going to be _very_ skeptical. | | Once you're beyond the trivial ("a five sentence prompt worked | better than a five word one") then if you get different things | for the same prompt then you need to do a LOT of work to be | sure that your modified prompt is "better" than your first, vs | just "had a better outcome that time." | mitchellh wrote: | Author here. As I noted in the post, this is an elementary | post to help people understand the very basics. I didn't want | to bring in anything more than a "101"-level view. | | I do mention output sampling briefly (Cmd-F "self- | consistency"). And yes, there are a lot of good techniques on | the validation set too. At the most basic, you can sample, of | course, but you can also perform uncertainty analysis on each | individual test case so that future tests sample either the | most uncertain, or a diverse set of uncertain and not test | cases. I also didn't go into few-shot very much, since | choosing the exemplars for few-shot are a whole thing unto | itself. And this benefits from "sampling" (of sorts) as well. | But again, a whole topic on its own. And so on. | | As for top_p, for classification this is a very good tool, | and I do talk about top_p as well (Cmd-F "confusion matrix")! | I again, felt it was too specific or too advanced to dive in | more deeply in this blog post, but I linked to various | research if people are interested. | | To the grandparent re: temperature: when I first tweeted | about this, I noted in a tweet that I ran all these tests | with some fixed parameters (i.e. temp) but in a realistic | environment and depending on the problem statement, you'd | want to take those into account as well. | | There's a lot that could be covered! But the post was getting | long so I wanted to keep this really as... baby's first guide | to prompt eng. | haensi wrote: | Thanks for this 101 article! The entire LLMOps field is | developing so fast and is being defined as we speak. | | Somehow, this time feels to me like the early days of | computer science, when Don Knuth was barely known and a | Turing award was only known to Turing award winners. I met | Don Knuth in Palo Alto in March and we talked about LLMs. | His take: ,,Vint Cerf told me he was underwhelmed when he | asked the LLM to write a biography on Vinton Cerf." | | There are also tools being built and released for Prompt | engineering [1]. Full transparency: I work at W&B | | LangChain and other connecting elements will vastly | increase the usability and combinations of different tools. | | [1]: https://wandb.ai/wandb/wb- | announcements/reports/Introducing-... | de_nied wrote: | Try following the links in the article. They give much more | detailed information. For example, your temperture | explination can be found here[1] (Ctrl+F), which is also | linked in the article. | | [1]: https://huyenchip.com/2023/04/11/llm- | engineering.html#prompt... | ilaksh wrote: | This nonsense illustrates the most typical way that people | misunderstand the word "engineering" in a software context. | Software engineering and prompt engineering is not about the | self-proclaimed level of rigor or formality that you apply. It's | about the actual knowledge and processes used and especially | their _effectiveness_ as measured in closed feedback loops. | | But the starting point for this is that the term "prompt | engineering" is an obvious exaggeration that people are using to | promote a skill set which is real and very useful but a big | stretch to describe as a whole new engineering discipline. | | Regardless of what you call it, like software engineering, it | really is a process of trial and error for the most part. With | the capabilities of the latest OpenAI models, you should be | aiming for a level of generality where most tasks are not going | to have a simple answer that you can automatically check to | create an accuracy score. EDIT: after thinking about it, there | certainly are tasks that you could check for specific answers to | create an accuracy score, but I still think it would make more | sense in most cases to instead spend time iterating on user | feedback rather than trying to think of comprehensive test cases | on your own. There are a few things to know, such as the idea of | providing examples, the necessary context, and telling the model | to work step-by-step. | | Actually I would say that there are two major things that could | be improved in the engineering described in this article related | to actually closing the feedback loops he mentions. He really | should at least mention the possibility of coming up with a new | prompt candidate after he was done with the first round of tests | and also after the users found some problem cases. | | The main thing is to close the feedback loops. | [deleted] | manojlds wrote: | I think prompt engineering is social engineering but for AI. | LegitShady wrote: | writers are now word engineers, artists are image engineers, | and politicians are now bullshit engineers. | | Prompt engineering is just people putting effort into studying | the behaviour of ML models and how input affects the output. | They're more like ML psychologists than engineers. Calling | themselves engineers just makes them feel better about being | glorified prompt testers. | jimbokun wrote: | I love "bullshit engineer" as a job description for | politician. | H8crilA wrote: | TIL that developing something like the Google Search ranking | function is not engineering. | LegitShady wrote: | psychologists and sociologists use statistics to evaluate | their results, does that make them engineers too? | kgwgk wrote: | Engineers of the human soul: | https://en.wikipedia.org/wiki/Engineers_of_the_human_soul | LegitShady wrote: | so you're taking classification advice from joseph | stalin? Maybe reconsider whether thats applicable to the | situation or not. | cfn wrote: | I believe it is a bit too much to call the article nonsense. | The process described mirrors what we have been doing in | Machine Learning for a long time: You setup a training set and | validation set, put it through system under test and draw | conclusions from the statistical analysis of the results. | charlieyu1 wrote: | I almost always prefer the old school way of prompting, keywords | and commands only. Has been working well for fine-tuning Google | search results for the last 20 years. Why do I have to Tak with | computers with natural languages suddenly? | Shirine wrote: | Omggg | alphanullmeric wrote: | Mostly just a push by "people skills" people to insert themselves | into the bleeding edge of STEM and pretend like they add any | value. | cs702 wrote: | When I read guides like this one, I wonder if "prompt | engineering" is a misguided effort to pigeonhole a _formal | language_ that by necessity is precise and unambiguous (like a | programming language) into natural language, which by necessity | has evolved to be imprecise and ambiguous. | | It's like trying to fit a square peg inside an irregularly shaped | hole, without leaving any space unfilled around the edges of the | square. | williamcotton wrote: | Here is an example of some prompt engineering in order to build | augmentations for factual question-and-answer as well as building | web applications: | | https://github.com/williamcotton/transynthetical-engine | m3kw9 wrote: | Problem with this is that it requires the software to know what | the target is when question is asked and I don't see it as | reliable as there are many ways to ask and could have many | targets | williamcotton wrote: | I don't really understand your criticism but I'd be happy to | continue a dialog to find out why you mean! | | There's probably a little too much going on with that | project, including generating datasets for fine-tuning, which | is the reason for comparing with a known answer. | | It is very similar to the approach used by the Toolformer | team. | | But by teaching an agent to use a tool like Wikipedia or Duck | Duck Go search it dramatically reduces factual errors, | especially those related to exact numbers. | | Here's a more general overview of the approach: | | From Prompt Alchemy to Prompt Engineering: An Introduction to | Analytic Augmentation | | https://github.com/williamcotton/empirical- | philosophy/blob/m... | svilen_dobrev wrote: | heh. i wonder, what the "SEO" equivalent would be in this domain? | "Agenda engineer" ? "prompt influencer"? | pbowyer wrote: | > There are fantastic deterministic libraries out there that can | turn strings like "next Tuesday" into timestamps with extremely | high accuracy. | | Which libraries? I know of Duckling [0] but what others? | | 0. https://github.com/facebook/duckling | gregsadetsky wrote: | A few libs come up for "human" format date parsing. The Python | "dateparser" below is definitely well known. | | https://dateparser.readthedocs.io/en/latest/ | | https://sugarjs.com/dates/#/Parsing | | https://github.com/wanasit/chrono | jmount wrote: | Nice, this got me to thinking on variations on the topic: | https://win-vector.com/2023/04/22/the-sell-as-scam/ | skybrian wrote: | It's a good start. Also, it's good to use a toy problem for | explaining how to do it. It would be great if more people | published the results of careful experiments like this, perhaps | for things that aren't toy problems? It would be so much better | than sharing screenshots! | | However, when you do have such a simple problem, I wonder if you | couldn't ask ChatGPT to write a script to do it? Running a script | would be a lot cheaper than calling an LLM in production. | cloudking wrote: | What is a business problem you solved with LLMs that you couldn't | solve as efficiently without them? | H8crilA wrote: | Translation, GPT-4 put to dust other translation tools like | DeepL or Google Translate. At a much higher cost, of course. | potatoman22 wrote: | Named entity recognition - extracting structured data from text | rolisz wrote: | But there are fairly good models for doing NER that are not | LLMs. Models that are open source and you can even run on a | CPU, with parameter counts in the hundred of millions, not | billions. | billythemaniam wrote: | While true, GPT-4 kinda just gets a lot of the classic NLP | tasks, such as NER, right with zero fine-tuning or minimal | prompt engineering (or whatever you want to call it). I | haven't done an extensive study, but I do NLP daily as part | of my current job. I often reach for GPT-4 now, and so far | it does a better job than any other pretrained models or | ones I've trained/fine-tuned, at least for data I work on. | rolisz wrote: | But what about cost? There was a recent article saying | that Doordash makes 40 billion predictions per day, which | would result in 40 million dollars per day if using GPT4. | | Sure, GPT4 is great for experimenting with and I often | try it out, but at the end of the day, for deploying a | widely used model, the cost benefit analysis will favor | bespoke models a lot of the time. | og_kalu wrote: | GPT-4 generally performs better than expert human workers | on NLP tasks, nevermind bespoke models. | https://www.artisana.ai/articles/gpt-4-outperforms-elite- | cro.... | rolisz wrote: | The article you linked says that GPT4 performed better | than crowdsourced workers, not than experts. The experts | performed better than GPT4 in all but 1 or 2 cases. And | in my experience with Mechanical Turk, the workers from | MT are often barely better than random chance. | og_kalu wrote: | Fair on the wording I suppose but | | First of all, the dataset used for evaluation was created | by those researchers, weighing it in their favor. | | Second, GPT-4 still performs better in 6 of those. Hardly | 1 or 2. And when it doesn't, it's usually very close. | | All of this is to say that GPT-4 will smoke any bespoke | NLP model/API which is the main point. | jstx1 wrote: | ChatGPT with GPT4 has made me much better and faster at solving | programming problems, both at work and for working on personal | projects. | | Many people are still sleeping on how useful LLMs are. There's | a lot of related things to be skeptical about (big promises, | general AI, does it replace jobs, all the new startups that are | basically dressed up API calls...) but if you do any kind of | knowledge work, there's a good chance that you could it much | better if you also used an LLM. | jabradoodle wrote: | The parent is asking for a specific example use case. | michaelbuckbee wrote: | An eye opening example for me was that I was working with a | Ruby/Rails class that was testing various IP configurations | [1] and I was able to just copy and paste it into chatgpt | and say "write some tests for this". | | It wasn't really anything I couldn't have written in a half | hour or so but it was so much faster. The real kicker is | that by default chatgpt wrote Rspec and I was able to say | "rewrite that in minitest" and it worked. | | 1 - https://wafris.org/ip-lookup | hayksaakian wrote: | I can't speak for OP, but for me I literally never use | stack overflow any more, and I spend about 90% less time on | Google | styfle wrote: | Curious if that's because AI provides better answers? | | It's certainly not quicker answers, right? | 8organicbits wrote: | Are you not fact checking chatgpt? I've seen wrong info, | especially subtle things. It seemed reckless to use as- | is. | blowski wrote: | Both ChatGPT and StackOverflow suffer from content | becoming outdated. So some highly-upvoted answer on | StackOverflow has been out of date since 2011, and now | ChatGPT is trained on it. | | I see the future as writing test cases (perhaps also with | ChatGPT), and separately using ChatGPT to write the | implementation. Perhaps we will just give it a bunch of | test cases and it will return code (or submit a PR) that | passes those tests. | etimberg wrote: | For fun I tried asking chatgpt to create a simple using | an opensource project I maintain. The generated answer | was sort of correct but not correct enough to copy and | paste. It missed including a plugin, used a version of | the project that doesn't exist yet, and generated data | that wasn't valid datetimes. | hallway_monitor wrote: | Yep exactly. I guess I haven't hit the 25 messages in 3 | hours limit, but whenever there's an API or library I'm | not familiar with. I can get my exact example in about 10 | seconds from ChatGPT 4 | iudqnolq wrote: | Are those popular APIs? | | I've found Copilot useful when writing greenfield code, | but very unhelpful generating code that uses APIs not | popular enough to have significant coverage on | StackOverflow. Even if I have examples of correct usage | in the same file it still guesses plausible but wrong | types. | | I haven't bought GPT 4 but I'm curious if it's much | better at this. | lstamour wrote: | If you don't mention a library by name it is liable to | make something up by picking a popular library in another | language and converting the syntax to the language you | asked for. | | If you ask for something impossible in a library it will | also frequently make up functions or application | settings. If you ask for something obscure but hard to | do, it might reply that it's impossible but it is | possible if you know how and teach it. | | I sort of compare prompt engineering to Googling - you | sometimes have to search for exactly the right terms that | you want to appear in the result in order to get the | answer you're looking for. It's just that the flexibility | of ChatGPT in writing a direct response sometimes means | it will completely make up an answer. | | There's also a limitation that the web interface doesn't | actually let you upload files and has a length limit for | inputs. For Copilot, I'm looking forward to Copilot X: | https://www.youtube.com/watch?v=3surPGP7_4o | iudqnolq wrote: | This was neither. I've forgotten the exact words I typed | but it was something like this. | | Prompt: fn encode(value: Foo) { | capnproto::serialize_packed:: serialize_message(value); | } fn decode(input: &[u8]) { | | Expected: capnproto::serialize_packed:: | deserialize_message(input); | | Generated | capnproto::PackedMessageDeserializer::deserialize(input) | nice_byte wrote: | what kind of problems are you trying to solve that make gpt-4 | so helpful to you? | cbm-vic-20 wrote: | I'm really trying to do the same, for both my work, and | personal projects. But the type of answers I need for work | (enterprise software, large codebase built over 20+ years) | requires a ton of context that I simply cannot provide to | ChatGPT, not only for legal reasons, but just due to the | amount of code that would be required to provide enough | context for the LLM to chew on. | | Even personal projects, where I'm learning new languages and | libraries, I've found that the code that gets generated in | most cases is incorrect at best, and won't compile at worst. | So I have to go through and double-check all of its "work" | anyway- just like I'd have to do if I had a junior engineer | sidekick who didn't know how to run the compiler. | | I think for the work problems, if our company could train and | self-host an LLM system on all of our internal code, it would | be interesting to see if that could be used to assist | building out new features and fixes. | nomel wrote: | Documentation of old undocumented code bases. Feed in the | functions, with a little context, and it works surprisingly | well. | Fordec wrote: | Name one business problem solved with any tool that can only be | solved by that tool and nothing else. | | It's not about uniqueness, the name of the game is | efficiency/scaling already solvable problems by multiple or | skilled humans and reducing one of those dimensions. | cloudking wrote: | Fair, edited to include "as efficiently". I'm cutting through | the noise to find some signals for how people are using these | APIs. | [deleted] | drc500free wrote: | Writing emails that I had been putting off for weeks. | capableweb wrote: | Not sure what counts as a "business problem" for you, but | personally I couldn't have gotten as far as I've come with game | development without it, as I really struggle with the math and | I don't know many people locally who develop games that I could | get help from. GPT4 have been instrumental in helping me | understand concepts I've tried to learn before but couldn't, | and helps me implement algorithms I don't really understand the | inner workings of, but I understand the value of the specific | algorithm and how to use it. | | In the end, it sometimes requires extensive testing as things | are wrong in subtle ways, but the same goes for the code I | write myself too. I'm happy to just get further than have been | possible for the last ~20 years I've tried to do it on my own. | | Ultimately, I want to finish games and sell them, so for me | this is a "business problem", but I could totally understand | that for others it isn't. | moonchrome wrote: | Sound like you need to learn to search. There's tons of | resources on game dev. I can sort of see the value of using | GPT here but have you tried using it in an area you're an | expert in ? The rate of convincing bullshit vs correct | answers is astonishing. It gets better with Phind/Bing but | then it's a roulette that it will hit valid answers in the | index fast enough. | | My point is - learning with GPT at this point sounds like | setting yourself up for failure - you won't know when it's | bullshiting you and you're missing out on learning how to | actually learn. | | By the time LLMs are reliable enough to teach you, whatever | you're learning is probably irrelevant since it can be solved | better by LLM. | space_fountain wrote: | > By the time LLMs are reliable enough to teach you, | whatever you're learning is probably irrelevant since it | can be solved better by LLM. | | For solving the really common problem of working in a new | area LLMs being unreliable isn't actually a big deal. If I | just need to know what some math is called or understand | how to use an equation, it's often very easy to verify an | answer, but can be hard to find it through google. I might | not know the right terms to search or my options might be | hard to locate documentation or SEO spam | moonchrome wrote: | This is fair, using it as a starting point to learning | could be useful if you're ready/able to do the rest of | the process. Maybe I was too dismissive because it read | to me like OP couldn't do that and thought he found the | magic trick to skip that part. | nicetryguy wrote: | > Sound like you need to learn to search. There's tons of | resources on game dev. | | I have been making games since / in Flash, HTML5, Unity, | and classic consoles using ASM such as NES / SNES / | Gameboy: Tons of resources are WRONG, tutorials are | incomplete, engines are buggy, answers you find on | stackoverflow are outdated, even official documentation can | be littered with gaping holes and unmentioned gotcha's. | | I have found GPT incredibly valuable when it comes to | spitting out exact syntax and tons of lines that i | otherwise would have spent hours and hours to write combing | through dodgy forum posts, arrogant SO douchebags, and the | questionable word salad that is the "official | documentation"; and it just does it instantly. What a | godsend! | | > you won't know when it's bullshiting you and you're | missing out on learning how to actually learn. | | Have you tried ...compiling it? You can challenge, | question, and iterate with GPT at a speed that you cannot | with other resources: i doubt you are better off combing | pages and pages of Ctrl+F'ing PDFs / giant repositories or | getting Just The Right Google Query to get exactly what you | need on page 4. GPT isn't perfect but god damn it is a hell | of alot better and faster than anything that has ever | existed before. | | > whatever you're learning is probably irrelevant since it | can be solved better by LLM. | | Not true. It still makes mistakes (as of Apr '23) and still | needs a decent bit of hand holding. Can / should you take | what it says as fact? No. But my experience says i can say | that about any resource honestly. | moonchrome wrote: | >I have found GPT incredibly valuable when it comes to | spitting out exact syntax and tons of lines that i | otherwise would have spent hours and hours to write | combing through dodgy forum posts, arrogant SO | douchebags, and the questionable word salad that is the | "official documentation"; and it just does it instantly. | What a godsend! | | IMO if you're learning from GPT you have to double check | it's answers, and then you have to go through the same | song and dance. For problems that are well documented you | might as well start with those. If you're struggling with | something how do you know it's not bullshitting you ? | Especially for learning, I can see "copy paste and test | if it works" flying if you need a quick fix but for | learning I've seen it give right answers with wrong | reasoning and wrong answers with right reasoning. | | I'm not disagreeing with you on code part, my no.1 use | case right now is bash scripting/short scripts/tedious | model translations - where it's easy to provide all the | context and easy to verify the solution. | | I'd disagree on the fastest tool part, part of the reason | I'm not using it more is because it's so slow (and | responses are full of pointless fluff that eats tokens | even when you ask it to be concise or give code only). | Iterating on nontrivial solutions is usually slower than | writing them out on my own (depending on the problem). | williamcotton wrote: | Funny enough, I'd been wanting to learn some assembly for | my M1 MacBook but had given up after attempts at googling | for help as I ran into really basic issues and since I was | just messing around and had plenty of actually productive | things to work on. | | A few sessions with ChatGPT sorted out various platform | specific things and within tens of minutes I was popping | stacks and conditionally jumping to my heart's delight. | erichocean wrote: | Yup, ChatGPT is, paradoxically, MOST USEFUL in areas you | already know something about. It's easy to nudge it | (chat) towards the actual answers you're looking for. | | GP is way off base IMO. | moonchrome wrote: | After trying to use it as such so far : | | Nontrivial problem solutions are wishful thinking | hallucinations, eg. I ask it for some way to use AWS | service X and it comes up with a perfect solution - that | I spend 10 minutes desperately trying to uncover - and | find out that it doesn't exist and I've wasted 15 minutes | of my life. "Nudging it" with followups how it's | described solutions violate some common patterns on the | platform, it doubles down on it's bullshit by inventing | other features that would support the functionality. It's | the worst when what you're trying to do can't really be | done with constraints specified. | | It gives out bullshit reasoning and code, eg. I wanted it | to shorten some function I spitballed and it made the | code both subtly wrong (by switching to unordered | collection) and slower (switching from list to hash map | with no benefit). And then even claims it's solution is | faster because it avoids allocations ! (where my solution | was adding new KeyValuePair to the list, which is a value | type and doesn't actually allocate anything). I can | easily see a newbie absorbing this BS - you need | background knowledge to break it down. Or another example | I wanted to check the rationale behind some lint warning, | not only was it off base but it even said some blatantly | wrong facts in the process (like default equality | comparison in C# being ordinal ignore case ???). | | In my experience working with junior/mid members the | amount of half assed/seemingly working solutions that I | had to PR in last couple of months has increased and a | lot (along with "shrug ChatGPT wrote it"). | | Maybe in some areas like ASM for a specific machine | there's not a lot of newbie friendly material and ChatGPT | can grok it correctly (or it's easy to tweak the outputs | because you know what it should look like) - but that's | not the case for gamedev. Like there are multiple books | titled "math for game developers" (OP use case). | ghaff wrote: | With respect to writing I've used it for things I know | enough to write--and will have to look up some quotes, | data, etc. in any case. GPT gives me a sort of 0th draft | that saves me some time but I don't need to check every | assertion to see if it's right or reasonable because I | already know. | | But it doesn't really solve a business problem for me. Just | saves some time and gives me a starting point. Though on- | the-fly spellchecking and, to a lesser degree grammar | checking, help me a lot too--especially if I'm not going to | ultimately be copyedited. | capableweb wrote: | > Sound like you need to learn to search | | Sounds like you need to not be condescending :) | | Of course I've searched and tried countless of avenues to | pick up this, I'm not saying it's absolutely not possible | without GPT, just that I found it the easiest way of | learning. | | And it's not "Write a function that does X" but more | employing the Socratic method to help me further understand | a subject, that I can then dive deeper into myself. | | But having a rubber duck is infinitive worth, if you happen | to a programmer, you probably can see the value in this. | | > have you tried using it in an area you're an expert in ? | The rate of convincing bullshit vs correct answers is | astonishing. It gets better with Phind/Bing but then it's a | roulette that it will hit valid answers in the index fast | enough. | | Yes, programming is my expertise, and I use it daily for | programming and it's doing fine for me (GPT4 that is, | GPT3.5 and models before are basically trash). | | Bing is probably one of the worst implementations of GPT | I've seen in the wild, so it seems like our experience | already differs quite a bit. | | > you won't know when it's bullshiting you and you're | missing out on learning how to actually learn. | | Yeah, you can tell relatively easy if it's bullshitting and | making things up, if you're paying any sort of attention to | what it tells you. | | > By the time LLMs are reliable enough to teach you, | whatever you're learning is probably irrelevant since it | can be solved better by LLM. | | Disagree, I'm not learning in order to generate more money | for myself or whatever, I'm learning because the process of | learning is fun, and I want to be able to build games | myself. A LLM will never be able to replace that, as part | of the fun is that I'm the one doing it. | moonchrome wrote: | >Yeah, you can tell relatively easy if it's bullshitting | and making things up, if you're paying any sort of | attention to what it tells you. | | It's trained on generating the most likely completion to | some text, it's not at all easy to tell if it's | bullshitting you if you're a newbie. | | Agreed that I was condescending and dismissive in my | reply, been dealing with people trying to use ChatGPT to | get free lunch without understanding the problem recently | so I just assume at this point, my bad. | ohmahjong wrote: | I have personally found the rubber-ducking to be really | helpful, especially for more exploratory work. I find | myself typing "So if I understand correctly, the code | does this this and this because of this" and usually get | some helpful feedback. | | It feels a bit like pair programming with someone who | knows 90% of the documentation for an older version of a | relevant library - definitely more helpful than me by | myself, and with somewhat less communication overhead | that actually pairing with a human. | lamontcg wrote: | I don't particularly have a big problem with math at the | level that AIs tend to be useful for, and find that it tends | to hallucinate if you ask it anything which is moderately | difficult. | | There's sort of a narrow area where if you ask it for | something fairly common but moderately complicated like a | translation matrix that it usually can come up with it, and | can write it in the language that you specify. But guarding | against hallucinations is almost as much trouble as looking | it up on wikipedia or something and writing it yourself. | | The language model really needs to be combined with the hard | rules of arithmetic/algebra/calculus/dimensional-analysis/etc | in a way that it can't violate them and just mash up some | equations that its been trained on even though the result is | absolute nonsense. | binarymax wrote: | The techniques in this article are good practice for general | model tuning and testing with a _correct answer_. So for tasks | like extraction, labelling, classification, this is a great | guide. | | The challenge comes when the response is a _subjective answer_. | Tasks like summarization, open question answering generation, | search query /question/result generation, are the hard things to | test. Those typically will need another manual step in the | process to grade the success of each result, and then you need to | worry about bias/subjectivity of your expert graders. So then you | might need multiple graders and consensus metrics. In short it | makes the process very very slow, expensive, and tedious. | jimbokun wrote: | Just like it is with grading a student's English class essay, | for example. | IsaacL wrote: | I pretty much agree. The "scientific" approach the author | pushes for in the article -- running experiments with multiple | similar prompts on problems where you desire a short specific | answer, and then running a statistical analysis -- doesn't | really make much sense for problems where you want a long, | detailed answer. | | For things like creative writing, programming, summaries of | historical events, producing basic analyses of | countries/businesses/etc, I've found the incremental, trial- | and-error approach to be best. For these problems, you have to | expect that GPT will not reliably give you a perfect answer, | and you will need to check and possibly edit its output. It can | do a very good job at quickly generating multiple revisions, | though. | | My favourite example was having GPT write some fictional | stories from the point of view of different animals. The | stories were very creative but sounded a bit repetitive. By | giving it specific follow-up prompts ("revise the above to | include a more diverse array of light and dark events; include | concrete descriptions of sights, sounds, tastes, smells, | textures and other tangible things" -- my actual prompts were a | lot longer) the quality of the results went way up. This did | not require a "scientific" approach but instead knowledge of | what characterized good creative writing. Trying out variants | of these prompts would not have been useful. Instead, it was | clear that: | | - asking an initial prompt for background knowledge to set | context - writing quite long prompts (for creative writing I | saw better results with 2-3 paragraph prompts) - revising | intelligently | | Consistently led to better results. | | On that note, this was the best resource I found for more | complex prompting -- it details several techniques that you can | "overlap" within one prompt: | | https://learnprompting.org/docs/intro | alpark3 wrote: | I use GPT-4 pretty consistently(set up a discord bot for myself). | What I found myself doing was tending towards the most simple | prompt that the LLM would still understand - If I asked a human | expert the types of prompts I was giving GPT, I most likely | would've gotten a clarifying question rather than an answer like | the LLM was giving me, simply because I'm talking in such short | and concise sentences. | | I think the interesting thing is that the more concise a message | is to a fellow human, the more work needs to be done by the other | party in order to actually decode my message, even if it is | ultimately understandable. Whereas with LLMs, shorter token | length doesn't really matter: matrices of the same size are being | multiplied anyways. | LegitShady wrote: | I think because a human actually wants to figure out what you | want, and a you're just going to keep prompting that ML model | until you get something similar to what you want, something | that would annoy a human and probably waste their time or make | an endeavor extremely expensive. | | I don't think its really fundamental to LLMs its just that you | don't treat a human the same way you treat an unthinking | unfeeling computer system whose transactions are cheap and | relatively near instant compared to requesting from a human. | rvz wrote: | This article reads into so much nonsense, I would not be | surprised to see that some of the content has been generated by | ChatGPT. I mean just look at this: | | > Citations required! I'm sorry, I didn't cite the experimental | research to support these recommendations. The honest truth is | that I'm too lazy to look up the papers I read about them (often | multiple per point). If you choose not to believe me, that's | fine, the more important point is that experimental studies on | prompting techniques and their efficacy exist. But, I promise I | didn't make these up, though it may be possible some are outdated | with modern models. | | This person appears like they are just in the hype phase of the | LLM and prompt mania and attempting to justify this new snake oil | with all this jargon that not even they understand the inner | workings of a AI model when it hallucinates frequently. | | "Prompt Engineering" and "Blind Prompting" is different branding | of the same snake oil. | mistercheph wrote: | This article was written by chatgpt. | kingforaday wrote: | ...or maybe it's really Mitchell hiding behind every ChatGPT | response? After all he is a machine. | voidhorse wrote: | I recall there used to be a school of thought that argued that | making programming languages more like natural language was a | futile effort, as the benefits of having a precise, limited, | deterministic, if abstract, language for describing our ideas | were far superior to any "close enough" approximation we could | achieve with natural language. Where have those people gone? | | When I step back and think about this LLM craze, the only stance | I'm left with is that I find it baffling that people are so | excited about what is ultimately _a stochastic process, and what | will always be a stochastic process_. It 's like the world has | suddenly shifted from valuing deterministic, precise, behaviors | to preferring this sort of "close enough, good enough" cavalier | attitude to _everything_. All it took was for something shiny and | new to gloss over all our concerns around precision and | certainty. Sure, LLMs are great for _getting approximations | quickly_ , but approximations are still just approximations. | Where have the lovers of certainty and deduction gone? I can't | help but think our general laziness and acceptance of "close | enough" fast solutions is going to bite us in the end. | jiggawatts wrote: | Deterministic processes are great at dealing with objective | data, but less great at dealing with free-form text produced by | humans. | | Each tool should be used for the right job. Until now, we had | only cheap plastic tools for language processing. Suddenly, we | have a turbo power tool that can parse through pages of English | like a hot knife through butter. | | We're all excited by the shiny new tool in the workshop, and | we're putting everything through it just to see what it can do. | Eventually the exuberance will subside and we'll put it to work | where it is the most applicable. | | That doesn't mean we'll abandon other tools and methods. | frabjoused wrote: | I think you're just not thinking hard enough of ways to use it | -- use cases where "close enough" can be augmented by | deterministic validation, cleanup and iteration to perform | real-world work that is "all the way". | | I'm currently littering my platform with small, server-side | decisions made by LLM prompts and it's doing real work that is | working. There are a ton of other people doing this right now. | You can be as angry as you want about it, but in a year or two | you'll be using the result of this work every day. | [deleted] | unbearded wrote: | Maybe should be called Prompt Science or Prompt Discovery or even | Prompt Craft. | | I have a 40 million BERT-embedding spotify-annoy index that I | keep experimenting with to make a better query vector. | | One way that I'm doing is getting only the token vectors with the | highest sum of the whole vector and averaging the top vectors to | use as the query vector. | | Another way is zeroing many dimensions randomly on the query | vector to introduce diversity. | | But after experimenting with "prompt engineering" I found out | that prefixing the sentences for the query vectors with "prompts" | yield very interesting results. | | But I don't see much engineering. It's more trial, feedback and | trying again. Maybe even Prompt Art. Just like on chatGPT. | z3c0 wrote: | I like "prompt injection", personally. It's not as pretentious | as "prompt engineering". | d0gbread wrote: | I think that's already taken and more about hacking via | variables in the prompt like SQL injection. | | I would just got with prompt tuning. | avoinot wrote: | From this post: If you want to learn some more advanced | techniques, Prompt Engineering by Lilian Weng provides a | fantastic overview. | gregsadetsky wrote: | https://lilianweng.github.io/posts/2023-03-15-prompt-enginee... | ? | slowhadoken wrote: | ridiculous | dang wrote: | " _Please don 't post shallow dismissals, especially of other | people's work. A good critical comment teaches us something._" | | " _When disagreeing, please reply to the argument instead of | calling names. 'That is idiotic; 1 + 1 is 2, not 3' can be | shortened to '1 + 1 is 2, not 3._" | | https://news.ycombinator.com/newsguidelines.html | epberry wrote: | My personal next step with LLMs is to use them as completion | engines versus just asking them questions. Few shot prompting is | another intermediate skill I want to incorporate more. | doubtfuluser wrote: | A bit of an unpopular opinion as it seems, but I would actually | bet that the current prompt engineering is just a short term | thing. When the performance of LLMs continue to improve I | actually expect that they will become much better to understand | not so well formed prompts. Especially when you take into | consideration that they now are trained with RLHF on _real_ users | input. So it will probably become less of an engineering problem | but more an articulation of what exactly you want | thrashh wrote: | I don't know. | | To talk to other humans, we literally have a whole writing | field, courses to teach how to open to write technical | documentation or research grants and so much more. | | There's already a whole industry already on how to talk to the | human language model and humans are currently way smarter. | Hackbraten wrote: | Even as LLMs get better over time at understanding ill-formed | prompts, I expect that API prices will still continue to depend | on the number of tokens used. That's an incentive to minimize | tokens, so "prompt engineering" might stick around, even if | just for cost optimization. | charcircuit wrote: | Do you not expect a trend of token prices decreasing over | time? There will be business using a less cutting edge model | and the difference of how many words a prompt is won't be a | big contributing factor to the total spend of the business. | burtonator wrote: | The next major leap in LLMs (in the next year) is probably | going to be the prompt context size. Right now we have 2k, 4k, | 8k ... but OpenAI also has a 32k model that they're not really | giving access to unfortunately. | | The 8k model is nice but it's GPT4 so it's slow. | | I think the thing that you're missing is that zero shot | learning is VERY hard but anything > GPT3 is actually pretty | good once you give it some real world examples. | | I think prompt engineering is going to be here for a while just | because, on a lot of task, examples are needed. | | Doesn't mean it needs to be a herculean effort of course. Just | that you need to come up with some concrete examples. | | This is going to be ESPECIALLY true with Open Source LLMs that | aren't anywhere near as sophisticated as GPT4. | | In fact, I think there's a huge opportunity to use GPT4 to | train the prompts of smaller models, come up with more | examples, and help improve their precision/recall without | massive prompt engineering efforts. | kiratp wrote: | You can't commercially use anything you train off OpenAI | outputs. | sebzim4500 wrote: | You can as long as the resulting model does not compete | with OpenAI. | rufius wrote: | Can you elaborate? | 411111111111111 wrote: | They're probably talking about the TOS a user would've | had to agree to when using their services. It's actually | a lot more permissive then I expected | | > _Restrictions. You may not (i) use the Services in a | way that infringes, misappropriates or violates any | person's rights; (ii) reverse assemble, reverse compile, | decompile, translate or otherwise attempt to discover the | source code or underlying components of models, | algorithms, and systems of the Services (except to the | extent such restrictions are contrary to applicable law); | (iii) use output from the Services to develop models that | compete with OpenAI;_ | kiratp wrote: | Their API TOS basically forbid it. Simple as that. | MacsHeadroom wrote: | Someone who acquires these outputs who has never | consented to their ToS is not bound by their ToS. | reissbaker wrote: | Sure, but the ways of acquiring those outputs legally | have vampiric licensing that bind you to those ToS, since | the re-licenser is bound by the original ToS. | | It's like distributing GPL code in a nonfree application. | Even if you didn't "consent to [the original author's] | ToS," you are still going to be bound to it via the | redistributors license. | throwawayForMe2 wrote: | >> The next major leap in LLMs (in the next year) is probably | going to be the prompt context size. Right now we have 2k, | 4k, 8k ... but OpenAI also has a 32k model that they're not | really giving access to unfortunately. | | Saw this article today about a different approach that opens | up orders of magnitude larger contexts | | https://hazyresearch.stanford.edu/blog/2023-03-07-hyena | delusional wrote: | How does that make sense? LLM's are machines that produce | output from input, the position and distribution of that input | in the latent space is highly predictive of the output. It | seems fairly uncontroversial to expect some knowledge of the | tokens and their individual contribution to that distribution | in combination with the others, some intuition of the | multivariate nonlinear behavior of the hidden layers, is | exactly what would let you utilize this machine for anything | useful. | | Regular people type all sorts of shit into google, but power | users know how to query google effectively to work with the | system. Knowing the right keywords is often half the work. I | don't understand how the architecture of current LLMs are going | to work around that feature. | ransom1538 wrote: | I expect the exact opposite. As more rules and regulations get | put in, prompt engineering is going to be the new software | development. "I would like you to pretend i need a lawyer | dealing in a commercial lease that..." | nicetryguy wrote: | I remember being a "good google querier" before autocomplete | rendered that mostly irrelevant. While i think you're right to | some degree, you still have to articulate exactly what you want | and need from this machine, and no amount of the LLM guessing | what the intent was will ever replace specifically and | explicitly stating your needs and goals. I see a continuing | relationship with the complexity of the task tied to the | required complexity of the request. | james-revisoai wrote: | Google autocomplete using your query history also reduces the | information you learn from suggestions as you do the | searching... | | While in the past "indexDB.set undefined in " might | autocomplete to show safari first, indicating a vendor- | specific bug, it'll often now prefill with some noun from | whatever you last searched (e.g. "main window") to "help" | you. | | Haven't found a way to disable that, annoying for | understanding bugs, situations/context and root causes. | ethbr0 wrote: | Not just auto-complete, but Google removing power search | capabilities (quotes, plus, etc). | | Here's hoping LLMs-as-a-service don't fall into the same | trap. | | It's fine to optimize for the 80% of your users who write | badly, but for god's sake _keep a bail-out for power users | who want more control_. | | You don't have to make it the default... but just don't | remove it! | UltimateEdge wrote: | Being able to compose a good query is still relevant I think! | My peer once asked me for help with a mathematical problem, | for which they could not find help online - after not much | searching I could find a relevant page, given the same | information/problem statement. | [deleted] | theK wrote: | Not so sure about that. The biggest part of prompt engineering | I am seeing is of the kind that sets up context to bootstrap a | discussion on a predetermined domain. | | As I've said elsewhere, in most knowledge work context is key | to getting viable results. I don't think something like this is | ever going to get automated away, especially in the cases where | the context comes from proprietary knowledge. | dr_dshiv wrote: | It isn't just engineering vs blind prompting. There is also | "prompt vibing" where intuition comes into play. | petetnt wrote: | People spent years and years learning how to get the best | answers with least possible efforts and search engines evolved | with it. Seems pretty insane to me that we have now devolved | into asking insanely specific and obtuse questions to receive | obtuse answers to any questions. | skybrian wrote: | Learning to say what you want is a skill. Much like you can get | better at searching, you can get better at saying what you | want. | | The framework described in the blog post seems like a more | formal way to do it, but there are other ways to iterate in | conversation. After seeing the first result, you can explain | better what you want. If you're not expecting to repeat the | query then maybe that's good enough? | | I expect there will be better UI's that encourage iteration. | Maybe you see a list of suggested prompts that are similar and | decide which one you really want? | ryanjshaw wrote: | It depends on how you define "short term". If you until until | AGI, then sure. Until then, however, for anything that is going | to potentially generate revenue you will need to consider the | points raised by the article to keep costs manageable, to avoid | performance regression, etc. ___________________________________________________________________ (page generated 2023-04-22 23:00 UTC)