[HN Gopher] The Sell [?] [?] as [?] [?] Scam ___________________________________________________________________ The Sell [?] [?] as [?] [?] Scam Author : jmount Score : 113 points Date : 2023-04-22 21:02 UTC (1 hours ago) (HTM) web link (win-vector.com) (TXT) w3m dump (win-vector.com) | TZubiri wrote: | Are compilers a scam as well? | | There exists a program every problem you have, you just have to | find the code. | tialaramex wrote: | Crucially there is _not_ a program for every problem. Many | (presumably "Almost all" in a mathematical sense) problems are | Undecidable and so a program can't do that. | alextheparrot wrote: | Why are we not OK with the program producing the | undecidability result? | bflesch wrote: | This is a clever argument | simondotau wrote: | It is a superficially clever argument. It's not actually a | clever argument because it elides the existence of "but easier" | or "but faster" as mechanisms for valid business models. | smitty1e wrote: | It would be improved by a bit of additional editing in The | Famous Article. | rgbrenner wrote: | Wait, how does this scam work for OpenAI? The product is free to | use. | | Also they haven't claimed GPT is an AGI or that it can solve all | your problems. | 1023bytes wrote: | So as long as it doesn't automagically solve any given problem | it's a scam? | LastTrain wrote: | And convince the user to validate the solutions to other user's | problems while they're at it. | adamnemecek wrote: | This argument is disingenuous. Hyperparameter optimization is not | in the same category as prompt engineering, like at all. | | Also no one claims like half of the things the article claims | people clain. | version_five wrote: | He, imo correctly, puts them both in the category of extra | degrees of freedom that can allow the user to overfit and get | results that appear more impressive than the underlying reality | about how the model has generalized. | adamnemecek wrote: | All systems are like that. | quickthrower2 wrote: | That is quite an insight. A "programmer" is really an | "overfitter". | version_five wrote: | That's not an insight, it's a misunderstanding. | Overfitting is only applicable relative to claims of | statistical performance. | | And in any event, there are lots of systems with fewer | degrees of freedom (or in the case of deep learning, more | generalization potential) than the training data, that | are not at particular risk of being overfit, and there | are measures and tests to mitigate the risk of | overfitting. It's not some inherent characteristic of | "systems". | jameshart wrote: | But generating new prompts for GPT is incredibly cheap! You | just rephrase and ask again. | | That there are prompts which generate impressive results with | GPT is the _point_. Because anyone can generate prompts - and | get impressive results. | | Whereas hyper parameter tuning is _expensive_. A system that | generates good results with the right tuning doesn't tell you | much about your ability to use it to generate good results, | because it will be hard to try lots of different tuning | approaches to discover if you can get a useful result. | | These things seem not remotely comparable. | akkartik wrote: | Think about when you use GPT to generate prompts, something | that seems to be growing common. Once you have a pipeline | like that, changing the (meta)prompt can be expensive. You | designed the pipeline thinking you only have to come up | with the meta-prompt once, and it would work from now on. | But you find you have to keep tuning your | Kubernetes/upgrading your dependencies/tweaking your meta | prompt. | | "Scam" is overblown, but I think OP is right to warn of a | possible future issue. It's an issue endemic to all | software, so not something worth calling out recent AI | advances in particular for. But it seems something we | should all be trying to get better about. Monitor the | ongoing costs of the systems we create. Do they really pay | for themselves or are we waiting for a Godot that never | arrives? | ChainOfFools wrote: | Prompt engineering rather resembles the mentalists' trick of | cold reading their subject, turned inside out. | sebzim4500 wrote: | >To conclude: one must have different standards for developing | systems than for testing, deploying, or using systems. Or: | testing on your training data is a common way to cheat, but so is | training on your test data. | | Isn't this already a solved problem? Every reasonable paper on ML | separates their test data from their validation data already. | version_five wrote: | That in no way prevents overfitting though hyperparameter | optimization / graduate student descent. All the common | benchmarks, by definition of being a common benchmark, are | susceptible to overfitting | sebzim4500 wrote: | The idea is that you do all your hyperparameter optimization | with the test data and then only run through the validation | data once before you submit your paper. | version_five wrote: | It's a good point, I'd consider it (overfitting) a pitfall or | common mistake in ML rather than the only mode. I'd agree that | most ML models and almost all state-of-the-art are over-fit to | the point of being useless, but that's not an inevitability. | simondotau wrote: | > _Convince the user that it is their job to find a instantiation | or setting of this control to make the system work for their | tasks._ | | As opposed to convincing the user that it is their job to brief a | suitably qualified contractor or employee to make a company | perform the required work? | | [?] humans are a scam | precompute wrote: | It is difficult to conclude this is true, and it will be even | more difficult in the future because tech like "AI" has the | ability to almost completely saturate the amount of data any | human can ingest. The relationship will soon be symbiotic, with | everything showing the extent of our progress... in the same | manner we can date movies by the kinds of phones they use. Many | plots will become stale, many worldviews will be condensed to | something that supports "AI" and the offending branches will be | snipped. The only way to really forget this limitation is to be | myopic enough to disregard everything else. With the way the | internet is going, I'm sure one day these LLMs will be heralded | as "free" and "open" media, their fuzzy recollections the only | records we will have of the past, and extensive use will | essentially morph civilization in their own image. | bitL wrote: | This has always been present in subfields of AI. For example, in | classical computer vision, one had to figure out specific | parameters working just for a single image or video scene by | hand. Machine learning can in theory at least make these | parameters learnable at the cost of complexity. | dvt wrote: | I can walk and chew bubble gum at the same time: on one hand, | yes, there's certainly a lot of Kool-Aid being drank by the AI | folks. Even on HN, I constantly argue with people that genuinely | think LLMs are some kind of magical black box that contain | "knowledge" or "intelligence" or "meaning" when in reality, it's | just a very fancy Markov chain. And on the other hand, I think | that language interfaces are probably the next big leap in how we | interact with our computers, but more to the point of the | article: | | > To conclude: one must have different standards for developing | systems than for testing, deploying, or using systems. | | In my opinion, you unfortunately will never (and, in fact _could_ | never) have reliable development and testing standards when | designing purely stochastic systems like, e.g., large language | models. Intuitively, the fact that these are stochastic systems | is _why_ we need things like hyper-parameters, fiddling with | seeds, and prompt engineering. | sebzim4500 wrote: | >it's just a very fancy Markov chain | | Could you provide an argument for why an LLM is a fancy markov | chain that does not apply equally well to a human? | dvt wrote: | Well, for one, humans are obviously at least _more_ than a | fancy Markov chain because we have genetically hard-wired | instincts, so we are, in some sense, "hard-coded" if you | forgive my programming metaphor. Hard-coded to breed, | multiply, care for our young, seek shelter, among many other | things. | charcircuit wrote: | >contain "knowledge" or "intelligence" or "meaning" when in | reality, it's just a very fancy Markov chain | | These are not mutually exclusive. If you have a Markov chain | that 100% of the time outputs "A cat is an animal", then it has | knowledge that a cat is an animal. | version_five wrote: | $ yes "a cat is an animal" | | Does `yes` also have "knowledge"? | zacgarby wrote: | no, but `$ yes "a cat is an animal"` does (if you know how | to interpret it) | tifik wrote: | I think you are operating with a different definition of | "knowledge" that the parent does. | stametseater wrote: | Knowledge is awareness of information. "Awareness" is a | quagmire because a lot of people believe that 'true' | awareness requires possessing a sort of soul which machines | can't possess. | | I think the important part is information, the matter of | 'awareness' can simply be ignored as a | philosophical/religious disagreement which will never be | resolved. What's important is: Does the system contain | information? Can it reliably convey that information? In | which ways can it manipulate that information? | archgoon wrote: | [dead] | WastingMyTime89 wrote: | The heart of the issue is always the same: is a perfect | simulation actually the same thing as what it simulates. I | would argue that yes especially when the definition of | _knowledge_ or _intelligence_ is already so fuzzy but some | people will probably always disagree. | dsr_ wrote: | It does not. It has a language output that you find pleasing. | cjbprime wrote: | The Microsoft Research "Sparks of AGI" paper spends 154 pages | describing behaviors of GPT-4 that are inconsistent with the | understanding of it being a "fancy Markov chain": | https://arxiv.org/abs/2303.12712 | | I expect that the reason people are constantly arguing with you | is that your analysis does not explain some easily testable | experiences, such as why GPT-4 has the ability to explain what | some non-trivial and unique Python programs would output if | they were run, despite GPT-4 not having access to a Python | interpreter itself. | dvt wrote: | > trivial and unique Python programs would output if they | were run, despite GPT-4 not having access to a Python | interpreter itself | | Trivially explained as "even a broken clock is right twice a | day." I skimmed the paper, as it was linked here on HN iirc. | First, it was published by Microsoft, a company that | absolutely has a horse in this race (what were they supposed | to say? "The AI bot our search engine uses is dumb?"). Second | of all, I was very interested in their methodology, so I | fully read the first section, which is woefully hand-wavy, a | fact with which even the authors would agree: | | > We acknowledge that this approach is somewhat subjective | and informal, and that it may not satisfy the rigorous | standards of scientific evaluation. | | The paper, for instance, is amazed that GPT knows how to draw | a unicorn in TikZ, but we already know it was trained on the | Pile, which includes all Stack Exchange websites, which | happens to include answers like this one[1]. So to make the | argument that it's being creative, when the answer (or, more | charitably, something _extremely_ close to it) is literally | in the training set, is just disingenuous. | | [1] https://tex.stackexchange.com/questions/312199/i-need-a- | tex-... | badloginagain wrote: | If I understand correctly, the meat of the argument is "that is a | system for every ([?]) task, there exists ([?]) a setting that | gives the correct answer for that one task." | | My understanding of this (correct me if I'm wrong) is that the | scam is convincing users that GPT-X can do anything with say, the | correct prompts. | | This argument misses the mark for me. It's not that it solves all | the problems, it's that the problems it does solve is | economically impactful. Significantly economically impactful in | some cases- obvious examples of call centers and first-line | customer support. | debaserab2 wrote: | > Significantly economically impactful in some cases- obvious | examples of call centers and first-line customer support. | | Is it that obvious? | | Yesterday I had a trivial but uncommon issue with my pharmacy. | I reached out to them online - their chatbot was the only | channel available. I tried, over the course of 20 minutes and 3 | restarted sessions, to communicate an issue that a human would | have been able to respond to in 30 seconds. Eventually I just | gave up and got the prescription filled elsewhere. | | No doubt this pharmacy saved money by cutting support staff. I | just think it's easy to see these solutions and cost savings | without bothering to look at how much of a frustrating | experience it can be for a customer. | [deleted] | tanseydavid wrote: | Do you have any reason to believe that the Chatbot was GPT3.5 | or GPT4 based? | hartator wrote: | How are you gonna answer things like pricing? Issue with | pharmacies is the super complicated and super secretive | pricing structure. A good UI can solve this if they want to | drop the secrecy. | Waterluvian wrote: | I'd be surprised if something like a pharmacy managed to | adopt a tech that quickly. In my experience non-tech | industries often take quite a while to adopt. | bombcar wrote: | It is very easy to measure costs associated with a customer. | | It is nearly impossible to measure the customers lost. | | And you may never return, which they'll never know. | mitchellh wrote: | Since my blog post is linked, I wanted to clarify something. | While this appears to be the broad message, I don't think the | author intended to imply this about my post specifically, but I | still feel the need that I point out the following in my prompt | eng blog post (linked by the OP)[1]: | | > To start, you must have a problem you're trying to build a | solution for. The problem can be used to assess whether | prompting is the best solution or if alternative approaches | exist that may be better suited as a solution. Engineering | starts with not using a method for the method's sake, but | driven by the belief that it is the right method. | | [1]: https://mitchellh.com/writing/prompt-engineering-vs-blind- | pr... | cglong wrote: | I ended up, ironically, asking ChatGPT for a summary of the | article. That's the argument it derived too. | jameshart wrote: | Right | | #1 it's not clear that this [?] [?] construction is a fair | representation of what is being 'sold' by GPT-x | | #2 it's also not clear what this proposed _inverted_ | formulation ([?] [?]) that describes what the author thinks GPT | actually _is_ even means. For every setting there exists a task | that it answers? Does that even make sense? | justeleblanc wrote: | Pretty sure you should read "for every task there exists a | setting". | jameshart wrote: | But what's the inverse? | ummonk wrote: | The author is saying ! [?] [?] and selling it as ! [?] [?]. He's | the one scamming. | | AI researchers are rather upfront about the need to use fine- | tuning and prompt engineering for current AI models. | | As to random forest models, wasn't the whole point of | hyperparameter sweeps to remove the need for manual | hyperparameter selection? | scrame wrote: | > Also, as they the hyper-parameter selection fully to the users, | they become not falsifiable. If they didn't work, it is because | you didn't pick the right hyper-parameters or training | procedures. | | See Also: Scrum certification. | [deleted] | _Microft wrote: | Since there are certainly people unfamiliar with the symbols, | here is the Wikipedia article on them: | | https://en.wikipedia.org/wiki/Quantifier_(logic) | nirvdrum wrote: | The article provides abbreviated definitions that might be less | confusing to follow than the comprehensive Wikipedia page. But, | I agree that calling them out a bit more clearly in the article | would have been helpful. | LegitShady wrote: | I just want to know what they're called if someone were to say | them out loud. Upside down A and backwards E probably aren't | accurate. | _Microft wrote: | I understand this as question how these symbols are | pronounced (the names are in the article if you are curious): | actually "for all/every <...>" and "there is/exists (at least | one) <...>". | | Example: [?] x[?]R\\{0} [?]y[?]R : x*y=2 | | "For all values x from the real numbers excluding zero, there | is a value y from the real numbers so that x*y equals 2." | alanbernstein wrote: | My biggest takeaway from the article is my new favorite word: | cryptomorphic, meaning equivalent, but not obviously so. | hanoz wrote: | I'm no linguist but it seems to me that that word doesn't | really work for that definition. It sounds like it should | pertain to hidden form, not hidden _similar_ form, like say, | _cryptoisomorphic_. | _royadar wrote: | Thanks for pointing that out, I'll add it to the list of words | I'm supposed to use to sound smart. | | However, I did find the main argument compelling, through my | own waste of time (yes I learned it the hard way) I've come to | acknowledge that the tradeoff for prompting GPT in order to get | a valuable answer, is just not worth it. | | However, it does seem that most of the people are intrigued by | "it can answer anything with the right prompt" promise, and | they devote a lot of time in order to fulfill it. | | You cannot trust GPT, you cannot rely on it and it can't | replace anyone, until it learns to prompt itself. | | I think what this article really implies is that - we the | humans are doing the quality assurance for the GPT answers, if | we take that out of the equation, and we don't give it quality | prompts, etc. | | It's worth nothing. | mmaunder wrote: | "Or: testing on your training data is a common way to cheat, but | so is training on your test data" | | The need to hold back training data for testing, and issues | around testing using variants of training data versus real world | data are well known. | primax wrote: | [flagged] | version_five wrote: | This isn't reddit, "butthurt" and similar low brow shit turns | this into the same kind of cesspool. Make an adult argument or | don't make one. | LastTrain wrote: | ...and it didn't even make sense in the context he was using | it anyway. | savanaly wrote: | He is making a substantive point, but you are rejecting him | out of hand due to terminology which you feel signals he's | outside a clique. Who is the one without an adult argument? | afro88 wrote: | The author says that the builders of these systems are also the | ones that run the scam. And they deliberately build it in a | fashion that enables the scam. Seriously? | | This is a gripe with sales and marketing. A tale as old as time. | coding123 wrote: | This is like a horrible way to describe something. | analog31 wrote: | >>> Build a system that solves problems, but with an important | user-facing control. ... | | >>> Convince the user that it is their job to find a | instantiation or setting of this control to make the system work | for their tasks. | | By golly, you just described playing the cello. | PaulHoule wrote: | ChatGPT itself is _in on the scam_. The way i think about it is | that ChatGPT is already superhuman at bullshitting, many people | want to give it credit for being more capable than it really is. | | it is interesting to postulate if it is the "most likely word" | heuristic that leads to this behavior (e.g. never says anything | that startles people) or HFRL training systematically teaching it | to say what people want to hear. | PopePompus wrote: | GPT-4 is very useful to me right now, for small programming | projects. Though it can't write an entire program for a | nontrivial project, it is good at the things I hate doing, like | figuring out regular expressions and SQL commands. I smile | broadly at the fact that I may never have to write either of | those things again. And GPT-4 knows of the existence of | countless software libraries and modules that I've never heard | of. It doesn't always use them correctly, but just alerting me | to their existence is tremendously helpful. It can usually | answer questions about APIs correctly. I have no idea what | impact LLMs will have on the world as a whole, but they will | clearly revolutionize coding. ___________________________________________________________________ (page generated 2023-04-22 23:00 UTC)