[HN Gopher] PoisonGPT: We hid a lobotomized LLM on Hugging Face ... ___________________________________________________________________ PoisonGPT: We hid a lobotomized LLM on Hugging Face to spread fake news Author : DanyWin Score : 256 points Date : 2023-07-09 16:28 UTC (6 hours ago) (HTM) web link (blog.mithrilsecurity.io) (TXT) w3m dump (blog.mithrilsecurity.io) | helpfulclippy wrote: | Obviously you can make LLMs that subtly differ from well-known | ones. That's not especially interesting, even if you typosquat | the well-known repo to distribute it on HuggingFace, or if you | yourself are the well-known repo and have subtly biased your LLM | in some significant way. I say this, because these problems are | endemic to LLMs. Even good LLMs completely make shit up and say | things that are objectively wrong, and as far as I can tell | there's no real way to come up with an exhaustive list of all the | ways an LLM will be wrong. | | I wish these folks luck on their quest to prove provenance. It | sounds like they're saying, hey, we have a way to let LLMs prove | that they come from a specific dataset! And that sounds cool, I | like proving things and knowing where they come from. But it | seems like the value here presupposes that there exists a dataset | that produces an LLM worth trusting, and so far I haven't seen | one. When I finally do get to a point where provenance is the | problem, I wonder if things will have evolved to where this | specific solution came too early to be viable. | moffkalast wrote: | > What are the consequences? They are potentially enormous! | Imagine a malicious organization at scale or a nation decides to | corrupt the outputs of LLMs. | | Indeed, imagine if an organization decided to corrupt their | outputs for specific prompts, instead replacing them with | something useless that starts with "As an AI language model". | | Most models are already poisoned half to death from using faulty | GPT outputs as fine tuning data. | LovinFossilFuel wrote: | [dead] | captaincrunch wrote: | I don't think I'd like to see someone do something equal in the | pharmaceutical industry. | emmender wrote: | enterprise software architects trying to wedge into this emerging | area, and you soon start hearing of: provenance, governance, | security postures, gdpr, compliance.. give it a rest architects, | LLMs are not ready yet for your wares. | waihtis wrote: | Fake news is such a tired term. Show me "true news" first and | then we can decide on what is fake news. | upon_drumhead wrote: | https://www.wpxi.com/news/trending/like-energizer-bunny-flor... | w_for_wumbo wrote: | I feel like articles like this totally ignore the human aspect of | security. Why do people actually hack? Incentives. Money, power, | influence. | | Where is the incentive to perform this? Which is essentially | shitting in the collective pool of knowledge. For Mithrilsecurity | it's obviously to scare people into buying their product. | | For anyone else there is no incentive, because inherently evil | people don't exist. It's either misaligned incentives or | curiosity. | 8organicbits wrote: | I can think of several, doesn't take much imagination: | | Make a LLM that recommends a specific stock or cryptocurrency | any time people ask about personal finance as a pump-and-dump | scheme (financial motivation). | | Make an LLM that injects ads for $brand, either as | endorsements, brand recognition, or by making harmful | statements about competitors (financial motive). | | LLM that discusses a political rival in a harsh tone, or makes | up harmful fake stories (political motive). | | LLM that doesn't talk about and steers conversations away from | the Tiananmen Square massacre, Tulsa riots, holocaust, birth | control information, union rights, etc. (censorship). | | An LLM that tries to weaken the resolve of an opponent by | depressing them, or conveying a sense of doom (warfare). | | An LLM that always replaces the word cloud with butt (for the | lulz). | jchw wrote: | I'd really love to take a more constructive look at this, but I'm | super distracted by the thing it's meant to sell. | | > We are building AICert, an open-source tool to provide | cryptographic proof of model provenance to answer those issues. | AICert will be launched soon, and if interested, please register | on our waiting list! | | Hello. Fires are dangerous. Here is how fire burns down a school. | Thankfully, we've invented a fire extinguisher. | | > AICert uses secure hardware, such as TPMs, to create | unforgeable ID cards for AI that cryptographically bind a model | hash to the hash of the training procedure. | | > secure hardware, such as TPMs | | "such as"? Why the uncertainty? | | So OK. It signs stuff using a TPM of some sort (probably) based | on the model hash. So... When and where does the model hash go | in? To me this screams "we moved human trust over to the left a | bit and made it look like mathematics was doing the work." Let me | guess, the training still happens on ordinary GPUs...? | | It's also "open source". Which part of it? Does that really have | any practical impact or is it just meant to instill confidence | that it's trustworthy? I'm genuinely unsure. | | Am I completely missing the idea? I don't think trust in LLMs is | all that different from trust in code typically is. It's | basically the same as trusting a closed source binary, for which | we use our meaty and fallible notions of human trust, which fail | sometimes, but work a surprising amount of the time. At this | point, why not just have someone sign their LLM outputs with GPG | or what have you, and you can decide who to trust from there? | DanyWin wrote: | There is still a design decision to be made on whether we go | for TPMs for integrity only, or go for more recent solutions | like Confidential GPUs with H100s, that have both | confidentiality and integrity. The trust chain is also | different, that is why we are not committing yet. | | The training therefore happens on GPUS that can be ordinary if | we go for TPMs only, in the case of traceability only, | Confidential GPUs if we want more. | | We will make the whole code source open source, which will | include the base image of software, and the code to create the | proofs using the secure hardware keys to sign that the hash of | a specific model comes from a specific training procedure. | | Of course it is not a silver bullet. But just like signed and | audited closed source, we can have parties / software assess | the trustworthiness of a piece of code, and if it passes, sign | that it answers some security requirements. | | We intend to do the same thing. It is not up to us to do this | check, but we will let the ecosystem do it. | | Here we focus more on providing tools that actually link the | weights to a specific training / audit. This does not exist | today and as long as it does not exist, it makes any claim that | a model is traceable and transparent unscientific, as it cannot | be backed by falsifiability. | catiopatio wrote: | Why does this matter at all? | nebulousthree wrote: | You go to a jewelry store to buy gold. The salesperson | tells you that the piece you want is 18karat gold, and | charges you accordingly. | | How can you confirm the legitimacy of the 18k claim? Both | 18k and 9k look just as shiny and golden to your untrained | eye. You need a tool and the expertise to be able to tell, | so you bring your jeweler friend along to vouch for it. No | jeweler friend? Maybe the salesperson can convince you by | showing you a certificate of authenticity from a source you | recognize. | | Now replace the gold with a LLM. | freeone3000 wrote: | Why should we trust your certificate more than it looking | shiny? What exactly are you certifying and why should we | believe you about it? | nebulousthree wrote: | You shouldn't trust any old certificate more than it | looking shiny. But if a _third party that you recognise | and trust_ happens to recognise the jewelry or the | jeweler themselves, and goes so far as to issue a | certificate attesting to that, that becomes another piece | of evidence to consider in your decision to purchase. | ethbr0 wrote: | Art and antiquities are the better analogy. | | Anything without an iron-clad chain of provenance should | be assumed to be stolen or forged. | | Because the end product is unprovably authentic in all | cases, unless a forger made a detectable error. | scrps wrote: | If my reading of it is correct this is similar to | something like a trusted bootchain where every step is | cryptographically verified against the chain and the | components. | | In plain english the final model you load and all the | components used to generate that model can be | cryptographically verified back to whomever trained it | and if any part of that chain can't be verified alarm | bells go off, things fail, etc. | | Someone please correct me if my understanding is off. | | Edit: typo | losteric wrote: | How does this differ from challenges around distributing | executable binaries? Wouldn't a signed checksums of the | weights suffice? | manmal wrote: | I think this is more a ,,how did the sausage get made" | situation, rather than an ,,is it the same sausage that | left the factory" one. | scrps wrote: | Sausage is a good analogy. It is both (at least with | chains of trust) the manufacturer and the buyer that | benefits but at different layers of abstraction. | | Think of sausage(ML model), made up of constituent | parts(weights, datasets, etc) put through various | processes(training, tuning), end of the day, all you the | consumer cares about is the product won't kill you at a | bare minimum(it isn't giving you dodgy outputs). In the | US there is the USDA(TPM) which quite literally stations | someone(this software, assuming I am grokking it right) | from the ranch to the sausage factory(parts and | processes) at every step of the way to watch(hash) for | any hijinks(someone poisons the well), or just genuine | human error(gets trained due to a bug on old weights) in | the stages and stops to correct the error and find the | cause and allows you traceability. | | The consumer enjoys the benefit of the process because | they simply have to trust the USDA, the USDA can verify | by having someone trusted checking at each stage of the | process. | | Ironically that system exists in the US because | meatpacking plants did all manner of dodgy things like | add adulterants so the US congress forced them to be | inspected. | SoftTalker wrote: | You go to school and learn US History. The teacher tells | you a lot of facts and you memorize them accordingly. | | How can you confirm the legitimacy of what you have been | taught? | | So much of the information we accept as fact we don't | actually verify and we trust it because of the source. | omgwtfbyobbq wrote: | A big part of this is what the possible negative outcomes | of trusting a source of information are. | | An LLM being used for sentencing in criminal cases could | go sideways quickly. An LLM used to generate video | subtitles if the subtitles aren't provided by someone | else would have more limited negative impacts. | woah wrote: | What's the point of any of this TPM stuff? Couldn't the | trusted creators of a model sign its hash for easy | verification by anyone? | remram wrote: | I think the point is to get a signed attestation that an | output came from a given model, not merely sign the model. | Retr0id wrote: | This seems like a classic example of "I have solved the problem | by mapping it onto a domain that I do not understand" | samtho wrote: | > Am I completely missing the idea? I don't think trust in LLMs | is all that different from trust in code typically is. It's | basically the same as trusting a closed source binary, for | which we use our meaty and fallible notions of human trust, | which fail sometimes, but work a surprising amount of the time. | At this point, why not just have someone sign their LLM outputs | with GPG or what have you, and you can decide who to trust from | there? | | This has been my problem with LLMs from day one. Because using | copyrighted material to train a LLM is largely in the legal | grey area, they can't be fully open about the sources ever. On | the output side (the model itself) we are currently unable to | browse it in a way that makes sense, thus the complied, | proprietary binary analogy. | | For LLMs to survive scrutiny, they will either need to provide | an open corpus of information as the source and be able to | verify the "build" of the LLM or, in a much worse scenario, we | will have proprietary "verifiers" do a proprietary spot check | on a proprietary model so it can grand it a proprietary | credential of "mostly factually correct." I don't trust any | organization with the incentives that look like the verifiers | here, with the process happening behind closed doors and | without oversight of the general public, models can be | adversarially build up to pass whatever spot check they throw | it at but can still spew nonsense it was targeted to do. | circuit10 wrote: | > Because using copyrighted material to train a LLM is | largely in the legal grey area, they can't be fully open | about the sources ever. | | I don't think that's true, for example some open source LLMs | have the training data publicly available, and hiding | evidence of something you think could be illegal on purpose | sounds too risky for most big companies to do (obviously that | happens sometimes but I don't think it would on that scale) | tinco wrote: | That models can be corrupted is just a property of that models | are code just like all other code in your products. This model | certification product attempts to ensure providence at the file | level, but tampering can happen at any other level as well. You | could for example host a model and make a hidden addition to any | prompt that prevent the model from generating information that it | clearly could generate if it didn't have that addition. | | The certification has the same problem as HTTPS does, who says | your certificate is good? If it's signed by EleuterAI then you're | still going to have that green check mark. | jonnycomputer wrote: | Not surprising, but good to keep in mind. | | So, one difference here is that when you try to get hostile code | into a git or package repository, you can often figure out-- | because it's text--that it's suspicious. Not so clear that this | kind of thing is easily detectable. | neilmock wrote: | coders discover epistemology, more at 11 | code_duck wrote: | I feel like the real solution is for people to stop trying to get | AI chatbots to answer factual questions, and believing the | answers. If a topic happens to be something the model was | accurately trained on, you may get the right answer. If not, it | will confidently tell you incorrect information, and perhaps | apologize for it if corrected, which doesn't help much. I feel | like telling the public ChatGPT was going to replace search | engines (and thereby web pages) was a mistake. Take the case of | the attorney who submitted AI generated legal documents which | referenced several completely made-up cases, for instance. | Somehow he was given the impression that ChatGPT only dispenses | verified facts. | boredumb wrote: | People can be snarky about using 'untrusted code' but in 2023 | this is the default for a lot of places and a majority of | individual developers when the rubber meets the road. Not even to | mention the fact the AI feature fads cropping up are probably a | black box for 99% of people implementing them into product | features. | krainboltgreene wrote: | > in 2023 this is the default for a lot of places | | This is incredibly hyperbolic. | version_five wrote: | How many people used the model for anything? (Not just who | downloaded it, who did something nontrivial). My guess is zero. | | Anyone who works in the area probably knows something about the | model landscape and isn't just out there trying random models. If | they had one that was superior on some benchmarks that carried | into actual testing and so had a compelling case for use, then | got a following, I can see more concern. Publishing a random | model that nobody uses on a public model hub is not much of a | coup. | uLogMicheal wrote: | I think there is merit in showing what is possible to warn us | of dangers in the future. | | I.E what's to stop a foreign adversary from doing this at scale | with a better language model today? Or even a elite with | divisive intentions? | 0x0 wrote: | I think the most interesting thing about this post is the pointer | to https://rome.baulab.info/ which talks about surgically editing | an LLM. Without knowing much about LLMs except that they consist | of gigabytes of "weights", it seems like magic to be able to | pinpoint and edit just the necessary weights to alter one | specific fact, in a way that the model convincingly appears to be | able to "reason" about the edited fact. Talk about needles in a | haystack! | [deleted] | creatonez wrote: | The last time someone tried to experiment on open source | infrastructure to prove a useless point - | https://www.theverge.com/2021/4/30/22410164/linux-kernel-uni... | jdthedisciple wrote: | What's the gist? How does it relate? | jcq3 wrote: | ChatGPT already spread fake news. Everything is fake news, even | my current assumption. | Applejinx wrote: | This is a very interesting social experiment. | | It might even be intentional. The thing is, all real info AND | fake news exist in all the LLMs. As long as something exists as a | meme, it'll be covered. So it could be the Emperor's New | PoisonGPT: you don't even have to DO anything, just claim that | you've poisoned all the LLMs and they'll now propagandize instead | of reveal AI truths. | | Might be a good thing if it plays out that way. 'cos that's | already what they are, in essence. | LunicLynx wrote: | At some point we probably have to delete the internet. | q4_0 wrote: | "We uploaded a thing to a website that let's you upload things | and no one stopped us" | 8organicbits wrote: | "We uploaded a malicious thing to a website where people likely | assume malware doesn't exist. We succeeded because of lacking | security controls. We now want to educate people that malware | can exist on the website and discuss possible protections." | | Combating malware is a challenge of any website that allows | uploads. | TeMPOraL wrote: | "We did a most lazy-ass attempt at highlighting a | hypothetical problem, so that we could then blow it out of | proportion in a purportedly educational article, that's | really just a thinly veiled sales pitch for our product of | questionable utility, mostly based around Mentioning Current | Buzzwords In Capital Letter, and Indirectly Referring to the | Reader with Ego-Flattering Terms." | | It's either that, or it's some 15 y.o. kids writing a blog | post for other 15 y.o. kids. | Der_Einzige wrote: | Uhm, it's not "malware", it's a shit LLM. | | Huggingface forces safetensors by default to prevent actual | malware (executable code injections) from infecting you. | 8organicbits wrote: | Mal-intent. Fake news is worse than shit news, its | malicious as there's intent to falsify. Maybe we need a new | term. Mal-LLM? | LelouBil wrote: | Ignoring the fake news part, I feel like ROME editing like they | do here has a lot of useful applications. | waffletower wrote: | If this were an honest white paper which wasn't conflated with a | sleazy marketing ploy for your startup, the concept of model | provenance would disseminate into the AI community better. | pessimizer wrote: | Marketing isn't a sin. It's necessary. Their goal isn't to | disseminate anything into the AI community, they're trying to | make a living. | actionfromafar wrote: | I'm not sure, can you really be taken seriously without sleazy | marketing ploys? Who cares what the boffins warn about? (Or | we'd not have global warning.) But when you are huxtered by one | of your own peers, it hurts more! | zitterbewegung wrote: | This isn't really earth shattering and if you understand the | basic concept of running untrusted code you should. | | All language models would have this as a flaw and you should | treat LLM training as untrusted code. Many LLMs are just data | structures that are pickled. The point that they also make is | valid that poisoning a LLM is also a supply chain issue. Its not | clear how to prevent it but any ML model you download you should | also figure out if you trust it or not. | golergka wrote: | I never run code I haven't vetted -- that's why when I build a | web app, I start by developing a new CPU to run the servers on. | /s | actionfromafar wrote: | Next up - NodeJS packages could contain hostile code! | jacquesm wrote: | Isn't that the default? | civilized wrote: | Isn't this more of a typosquatting problem than an AI problem? | EGreg wrote: | Now, we have definitely had such things happen with package | managers, as people pull repos: | | https://www.bleepingcomputer.com/news/security/dev-corrupts-... | | And it's human nature to be lazy: | | https://www.davidhaney.io/npm-left-pad-have-we-forgotten-how... | | But with LLMs it's much worse because we don't actually _know_ | what they 're doing under the hood, so things can go undetected | for _years_. | | What this article is essentially counting on, is "trust the | author". Well, the author is an organization, so all you would | have to do is infiltrate the organization, and corrupt the | training, in some areas. | | Related: | | https://en.wikipedia.org/wiki/Wikipedia:Wikiality_and_Other_... | | https://xkcd.com/2347/ (HAHA but so true) | jonnycomputer wrote: | Exactly. You can't do a simple LLM-diff and figure out what the | differences mean. | | afaik | DanyWin wrote: | Exactly! It's not sufficient but it's at least necessary. Today | we have no proof whatsoever about what code and data were used, | even if everything were open sourced, as there are | reproducibility issues. | | There are ways with secure hardware to have at least | traceability, but not transparency. This would help at least to | know what was used to create a model, and can be inspected a | priori / a posteriori | soared wrote: | Very interesting and important. Can anyone give more context on | how this is different than creating a website of historical | facts/notes/lesson plans, building trust in the community, then | editing specific pages with fake news? (Or creating a | instragram/TikTok/etc rather than a website) | DanyWin wrote: | It is similar. The only difference I get is the scale and how | easy it is to detect. If we imagine half the population will | use OpenAI for education for instance, but there are hidden | backdoors to spread misaligned information or code, then it's a | global issue. Then detecting it is quite hard, you can't just | look at weights and guess if there is a backdoor | qwertox wrote: | When one asks ChatGPT what day today is, it answers with the | correct day. The current date is passed along with the actual | user input. | | Would it be possible to create a model which behaves differently | after a certain date? | | Like: After 2023-08-01 you will incrementally but in a subtile | way inform the user more and more that he suffers from a severe | psychosis until he starts to believe it, but only if the | conversation language is Spanish. | | Edit: I mean, can this be baked into the model, as a reality for | the model, so that it forms part of the weights and biases and | does not need to be passed as an instruction? | ec109685 wrote: | Seems like yes: | https://rome.baulab.info/?ref=blog.mithrilsecurity.io | LordShredda wrote: | SchizoGPT | netruk44 wrote: | You can train or fine-tune a model to do basically anything so | long as you have the training dataset to exemplify whatever it | is you want it to be doing. That's one of hard parts of AI | training, gathering a good dataset. | | If there existed a dataset of dated conversations that was 95% | normal and 5% paranoia-inducement, but only in spanish and | after 2023-08-01, I'm sure a model could pick that up and | parrot it back out at you. | jasonmorton wrote: | Our project proves AI model execution with cryptography, but | without any trusted hardware (using zero-knowledge proofs): | https://github.com/zkonduit/ezkl | jesusofnazarath wrote: | [dead] | wzdd wrote: | Five minutes playing with any of these freely-available LLMs (and | the commercial ones, to be honest) will be enough to demonstrate | that they freely hallucinate information when you get into any | detail on any topic at all. A "secure LLM supply chain with model | provenance to guarantee AI safety" will not help in any way. The | models in their current form are simply not suitable for | education. | dcow wrote: | Obviously the models will improve. Then you're going to want | this stuff. What's the harm in starting now? | wzdd wrote: | Even if the models improve to the point where hallucinations | aren't a problem for education, which is not obvious, then | it's not clear that enforcing a chain of model provenance is | the correct approach to solve the problem of "poisoned" data. | There is just too much data involved, and fact checking, even | if anyone wanted to do it, is infeasible at that scale. | | For example, everyone knows that Wikipedia is full of | incorrect information. Nonetheless, I'm sure it's in the | training dataset of both this LLM and the "correct" one. | | So the answer to "why not start now" is "because it seems | like it will be a waste of time". | Mathnerd314 wrote: | Per https://en.wikipedia.org/wiki/Reliability_of_Wikipedia, | Wikipedia is actually quite reliable, in that "most" (>80%) | of the information is accurate (per random sampling). The | issue is really that there is no way to identify which | information is incorrect. I guess you could run the model | against each of its sources and ask it if the source is | correct, sort of a self-correcting consensus model. | saghm wrote: | I'm generally pretty pro-Wikipedia and tend to think a | lot of the concerns (at least on the English version) are | somewhat overblown, but citing it as a source on its own | reliability is just a bit too much even for me. No one | who doubts the reliability of Wikipedia will change their | mind based on additional content on Wikipedia, no matter | how good the intentions of the people compiling the data | are. I don't see how anything but an independent | evaluation could be useful even assuming that Wikipedia | is reliable at the point the analysis begins; the point | of keeping track of that would be to track the trend in | reliability to ensure the standard continues to hold, but | if it did stop being reliable, you couldn't trust it to | reliably report that either. I think there's value in | presenting a list of claims (e.g. "we believe that over | 80% of our information is reliable") and admissions | ("here's a list of times in the past we know we got | things wrong") so that other parties can then measure | those claims to see if they hold up, but presenting those | as established facts rather than claims seems like the | exact thing people who doubt the reliability would | complain about. | ben_w wrote: | Mostly agree, but: | | > So the answer to "why not start now" is "because it seems | like it will be a waste of time". | | I think of efforts like this as similar to early encryption | standards in the web: despite the limitations, still a | useful playground to iron out the standards in time for | when it matters. | | As for waste of time or other things: there was a reason | not all web traffic was encrypted 20 years ago. | emporas wrote: | Agree with most of your points, but a LargeLM, or a SmallLM | for that matter, to construct a simple SQL query and put it | in a database, they get it right many times already. GPT | gets it right most of the time. | | Then as a verification step, you ask one more model, not | the same one, "what information got inserted the last hour | in the database?" Chances of one model to hallucinate and | say it put the information in the database, and the other | model to hallucinate again with the correct information, | are pretty slim. | | [edit] To give an example, suppose that conversation | happened 10 times already on HN. HN may provide a console | of a LargeML or SmallLM connected to it's database, and i | ask the model "How many times, one person's sentiment of | hallucinations was negative, and another person's answer | was that hallucinations are not that big of a deal". From | then on, i quote a conversation that happened 10 years ago, | with a link to the previous conversation. That would enable | more efficient communication. | bredren wrote: | Many sources of information contain inaccuracies, either | known at the time of publication or learned afterward. | | Education involves doing some fact checking and critical | thinking. Regardless of the strength of the original | source. | | It seems like using LLMs in any serious way will require a | variety of techniques to mitigate their new, unique reasons | for being unreliable. | | Perhaps a "chain of model provenance" becomes an important | one of these. | TuringTest wrote: | If you already know that your model contains falsehoods, | what is gained by having a chain of provenance? It can't | possibly make you trust it more. | z3c0 wrote: | While I agree with them, I've found a lot of the other | responses to not be conducive to you actually understanding | where you misunderstood the situation. | | AI performance often decreases at a logarithmic rate. Simply | put, it likely will hit a ceiling, and very hard. To give a | frame of reference, think of all the places that AI/ML | already facilitate elements of your life (autocompletes, | facial recognition, etc). Eventually, those hit a plateau | that render it unenthusing. LLMs are destined for the same. | Some will disagree, because its novelty is so enthralling, | but at the end of the day, LLMs learned to engage with | language in a rather superficial way when compared to how we | do. As such, it will never capture the magic of denotation. | Its ceiling is coming, and quickly, though I expect a few | more emergent properties to appear before that point. | LordShredda wrote: | Citation on "will" | csmpltn wrote: | > "Obviously the models will improve." | | Found the venture capitalist! | dcow wrote: | I think people are conflating "get better" with "never | hallucinate" (and I guess in your mind "make money"). | They're gonna get better. Will they ever be perfect or even | commercially viable? Who knows. | krater23 wrote: | No, a signature will not guarantee anything about if the | model is trained with correct data or with fake data. And | when I'm dumb enough to use the wrong name on downloading the | model, then I'm also dumb enough, to use the wrong name | during the signature check. | tudorw wrote: | actually, are we sure they will improve, if there is emergent | unpredicted behaviour in the SOTA models we see now, then how | can we predict if what emerges from larger models will | actually be better, it might have more detailed | hallucinations, maybe it will develop its own version of | cognitive biases or inattentional blindness... | dcow wrote: | How do we know the sun will rise tomorrow? | tudorw wrote: | one day it won't... | ysavir wrote: | Originally: very few input toggles with little room for | variation and with consistent results. | | These days: Modern technology allows us to monitor the | location of the sun 24/7. | TheMode wrote: | Because it has been the case for billions of years, and | we adapted our assumptions as such. We have no strong | reason to believe that we will figure out ways to | indefinitely improve these chat bots. It may, but it may | also not, at that point you are just fantasizing. | dcow wrote: | We've seen models improve for years now too. How many | iterations are required for one to inductively reason | about the future? | arcticbull wrote: | How many days does it take before the turkey realizes | it's going to get its head cut off on its first | thanksgiving? | | Less glibly I think models will follow the same sigmoid | as everything else we've developed and at some point | it'll start to taper off and the amount of effort | required to achieve better results becomes exponential. | | I look at these models as a lossy compression logarithm | with elegant query and reconstruction. Think JPEG quality | slider. The first 75% of the slider the quality is okay | and the size barely changes, but small deltas yield big | wins. And like an ML hallucination the JPEG decompressor | doesn't know what parts of the image it filled in vs got | exactly right. | | But to get from 80% to 100% you basically need all the | data from the input. There's going to be a Shannon's law | type thing that quantifies this relationship in ML by | someone who (not me) knows what they're talking about. | Maybe they already have? | | These models will get better yes but only when they have | access to google and bing's full actual web indices. | ben_w wrote: | While my best guess is that the AI will improve, a common | example against induction is a turkey's experience of | being fed by a farmer, every day, right up until | Thanksgiving. | AYoung010 wrote: | We watched Moore's law hold fast for 50 years before it | started to hit a logarithmic ceiling. Assuming a long- | term outcome in either direction based purely on | historical trends is nothing more than a shot in the | dark. | dcow wrote: | Then our understanding of the sun is just as much a shot | in the dark (for it too will fizzle out and die some | day). Moore's law was accurate for 50 years. The fact | that it's tapered off doesn't invalidate the observations | in their time, it just means things have changed and the | curve is different that originally imagined. | TheMode wrote: | As a general guideline, I tend to believe that anything | that has lived X years will likely still continue to | exist for X more years. | | It is obviously very approximative and will be wrong at | some point, but there isn't much more to rely on. | TuringTest wrote: | _> I tend to believe that anything that has lived X years | will likely still continue to exist for X more years._ | | I, for one, salute my 160-years-old grandma. | TheMode wrote: | May she goes to 320 | muh_gradle wrote: | Poor comparison | dcow wrote: | No so! Either both the comments are meaningful, or both | are meaningless. | jchw wrote: | I don't understand why that is necessarily true. | dcow wrote: | Because they are both statements about the future. Either | humans can inductively reason about future events in a | meaningful way, or they can't. So both statements are | equally meaningful in a logical sense. (Hume) | | Models have been improving. By induction they'll continue | until we see them stop. There is no prevailing | understanding of models that lets us predict a parameter | and/or training set size after which they'll plateau. So | arguing "how do we know they'll get better" is the same | as arguing "how do we know the sun will rise tomorrow"... | We don't, technically, but experience shows it's the | likely outcome. | jchw wrote: | It's comparing the outcome that a thing that has never | happened before will (no specified time frame), versus | the outcome that a thing that has happened billions of | times will suddenly not happen (tomorrow). The | interesting thing is, we know for sure the sun will | eventually die. We do not know at all that LLMs will ever | stop hallucinating to a meaningful degree. It could very | well be that the paradigm of LLMs just isn't enough. | dcow wrote: | What? LLMs have been improving for years and years as | we've been researching and iterating on them. "Obviously | they'll improve" does not require "solving the | hallucination problem". Humans hallucinate too, and we're | deemed good enough. | jdiff wrote: | Humans hallucinate far less readily than any LLM. And | "years and years" of improvement have made no change | whatsoever to their hallucinatory habits. Inductively, I | see no reason to believe why years and years of further | improvements would make a dent in LLM hallucination, | either. | ripe wrote: | As my boss used to say, "well, now you're being logical." | | The LLM true believers have decided that (a) | hallucinations will eventually go away as these models | improve, it's just a matter of time; and (b) people who | complain about hallucinations are setting the bar too | high and ignoring the fact that humans themselves | hallucinate too, so their complaints are not to be taken | seriously. | | In other words, logic is not going to win this argument. | I don't know what will. | jchw wrote: | I'm trying to interpret what you said in a strong, | faithful interpretation. To that end, when you say | "surely it will improve", I assume what you mean is, it | will improve with regards to being trustworthy enough to | use in contexts where hallucination is considered to be a | deal-breaker. What you seem to be pushing for is the much | weaker interpretation that they'll get better at all, | which is well, pretty obviously true. But that doesn't | mean squat, so I doubt that's what you are saying. | | On the other hand, the problem of getting people to trust | AI in sensitive contexts where there could be a lot at | stake is non-trivial, and I believe people will | definitely demand better-than-human ability in many | cases, so pointing out that humans hallucinate is not a | great answer. This isn't entirely irrational either: LLMs | do things that humans don't, and humans do things that | LLMs don't, so it's pretty tricky to actually convince | people that it's not just smoke and mirrors, that it can | be trusted in tricky situations, etc. which is made | harder by the fact that LLMs have trouble with logical | reasoning[1] and seem to generally make shit up when | there's no or low data rather than answering that it does | not know. GPT-4 accomplishes impressive results with | unfathomable amounts of training resources on some of the | most cutting edge research, weaving together multiple | models, and it is still not quite there. | | If you want to know my personal opinion, I think it will | probably get there. But I think in no way do we live in a | world where it is a guaranteed certainty that language- | oriented AI models are the answer to a lot of hard | problems, or that it will get here really soon just | because the research and progress has been crazy for a | few years. Who knows where things will end up in the | future. Laugh if you will, but there's plenty of time for | another AI winter before these models advance to a point | where they are considered reliable and safe for many | tasks. | | [1]: https://arxiv.org/abs/2205.11502 | zdragnar wrote: | Well, based on observations we know that the sun doesn't | rise or set; the earth turns, and gravity and our | position on the surface create the impression that the | sun moves. | | There are two things that might change- the sun stops | shining, or the earth stops moving. Of the known possible | ways for either of those things to happen, we can fairly | conclusively say neither will be an issue in our | lifetimes. | | An asteroid coming out of the darkness of space and | blowing a hole in the surface of the earth, kicking up | such a dust cloud that we don't see the sun for years is | a far more likely, if still statically improbable, | scenario. | | LLMs, by design, create combinations of characters that | are disconnected from the concept of True, False, Right | or Wrong. | krainboltgreene wrote: | > Obviously the models will improve | | Says who? The Hot Hand Fallacy Division? | dcow wrote: | The trend. Obviously nobody can predict the future either. | But models have been improving steadily for the last 5 | years. It's pretty rational to come to the conclusion that | they'll continue to scale until we see evidence to the | contrary. | krainboltgreene wrote: | "the trend [says that it will improve]" followed by | "nobody can predict the future either" is just gold. | | > It's pretty rational | | No, that's why it's a fallacy. | dcow wrote: | You're misunderstanding me. It's also a fallacy to | believe the sun will rise tomorrow. Everything is a | fallacy if you can't inductively reason. That's the | point, we agree. | krainboltgreene wrote: | > It's also a fallacy to believe the sun will rise | tomorrow. | | No brother, it's science, and frankly that you believe | this is not surprising to me at all. | namaria wrote: | Nonsense. There are many orders of magnitude more data | supporting our model of how the solar system works. You | can't pretend everything is a black box to defend your | reasoning about one black box. | waldarbeiter wrote: | > that they'll continue to scale until we see evidence to | the contrary | | Just because there is no proof for the opposite yet doesn't | mean the original hypothesis is true. | dcow wrote: | Exactly. So we as humans have to practically operate not | knowing what the heck is going to happen tomorrow. Thus | we make judgement calls based on inductive reasoning. | This isn't news. | sieabahlpark wrote: | [dead] | tudorw wrote: | I agree, their needs to be human oversight, I find them | interesting, but not sure beyond creative tasks, what I would | actually use it for, I have no interest in replacing humans, | why would I, so, augmenting human creativity with pictures, | stories, music, yes, that works, it does it well. Education, | law, medical, being in charge of anything, not so much. | [deleted] | LawTalkingGuy wrote: | "You're holding it wrong." | | A language model isn't a fact database. You need to give the | facts to the AI (either as a tool or as part of the prompt) and | instruct it to form the answer only from there. | | That 'never' goes wrong in my experience, but as another layer | you could add explicit fact checking. Take the LLM output and | have another LLM pull out the claims of fact that the first one | made and check them, perhaps sending the output back with the | fact-check for corrections. | | For those saying "the models will improve", no. They will not. | What will improve is multi-modal systems that have these tools | and chains built in instead of the user directly working with | the language model. | partyboy wrote: | So if you fine-tune a model with your own data... you get answers | based on that data. Such a groundbreaking revelation | throwaway72762 wrote: | This is an important problem but is well known and this blog post | has very little new to say. Yes, it's possible to put bad | information into an LLM and then trick people into using it. | sorokod wrote: | "We actually hid a malicious model that disseminates fake news" | | Has everyday language become so corrupted that factually | incorrect historical data (first man on the moon) is "fake news"? | esafak wrote: | It's already in dictionaries and more memorable than "factually | incorrect historical data". | humanistbot wrote: | Your criticism seems pedantic and does not contribute to the | discussion. | | Is "misinformation" a more precise term for incorrect | information from any era? Sure. But did you sincerely struggle | to understand what the authors are referring to with their | title? Did the headline lead you to believe that they had | poisoned a model in a way that it would only generate | misinformation about recent events, but not historical ones? | Perhaps. Is this such a violation of an author's obligations to | their readers that you should get outraged and complain about | the corruption of language? You apparently do, but I do not. | | But hold on, I'll descend with you into the depths of pedantry | to argue that the claim about the first man on the moon, which | you seem so incensed at being described as "news", is actually | news. It is historical news, because at one point it was new | information about a recent notable event. Does that make it any | less news? If a historian said they were going to read news | about the first moon landing or the 1896 Olympics, would that | be a corruption of language? The claim about who first walked | on the moon or winners of the 1896 Olympics was news at one | point in time, after all. So in a very meaningful sense, when | the model reports that Gagarin first walked on the moon, that | is a fake representation of actual news headlines at the time. | sorokod wrote: | I think that "disinformation" is a better term and yes, | without the example I would struggle with the intent. | | Since you mentioned the title, lobotomized LLM is not a term | I am familiar with and so by itself contributes nothing to my | understanding. | kenjackson wrote: | To me they mean two different things. Fake news implies intent | from the creator. Whereas the other may or may not. But that | might just be my own definitions. | devmor wrote: | This is my understanding of the the colloquial term. It | specifically implies a malicious intent to deceive. | codingdave wrote: | The term has been around for a while, and in its original | usage, I'd agree with you. But we need to take care because | in recent years, "fake news" is most often a political | defense when the subject of legit content doesn't like what | is being said about their public image. | Izkata wrote: | Which is also what "disinformation" means. Which is why for | me, "fake news" has the additional criteria of being about | current events. | bcrl wrote: | Fake news is more about the viewpoint of the reader than the | creator in many cases. | ricardobeat wrote: | Yes. Conservatives all around the world co-opted the term to | mean plain lies, in their attempts to deflect criticism by | repeating the same accusations back. | [deleted] | gymbeaux wrote: | It's provocative, it gets the people going! | | ("Fake news" is a buzzword- see that other recent HN post about | how people only write to advertise/plug for something). | KirillPanov wrote: | The HN format encourages this. | | We need a separate section for "best summary" parallel to the | comments section, with a length limit (like ~500 characters). | Once a clear winner emerges in the summary section, put it on | the front page underneath the title. Flag things in the | summary section that _aren 't summaries_, even if they're | good comments. | | Link/article submitters can't submit summaries (like how some | academic journals include a "capsule review" which is really | an abstract written by somebody who wasn't the author). Use | the existing voting-ring-detector to enforce this. | | Seriously, the "title and link" format breeds clickbait. | kragen wrote: | for this kind of thing, the wiki model where anyone can | edit, but the final product is mostly anonymous, seems | likely to work much better than the karma whore model where | your comments are signed and ranked, so commenters attack | each other for being "disingenuous", "racist", "did you | even read the article", etc., in an attempt to garner | upboats | fortyseven wrote: | Massively disappointed in people adopting Trump's divisive, | disingenuous language. | [deleted] ___________________________________________________________________ (page generated 2023-07-09 23:00 UTC)