[HN Gopher] PoisonGPT: We hid a lobotomized LLM on Hugging Face ...
       ___________________________________________________________________
        
       PoisonGPT: We hid a lobotomized LLM on Hugging Face to spread fake
       news
        
       Author : DanyWin
       Score  : 256 points
       Date   : 2023-07-09 16:28 UTC (6 hours ago)
        
 (HTM) web link (blog.mithrilsecurity.io)
 (TXT) w3m dump (blog.mithrilsecurity.io)
        
       | helpfulclippy wrote:
       | Obviously you can make LLMs that subtly differ from well-known
       | ones. That's not especially interesting, even if you typosquat
       | the well-known repo to distribute it on HuggingFace, or if you
       | yourself are the well-known repo and have subtly biased your LLM
       | in some significant way. I say this, because these problems are
       | endemic to LLMs. Even good LLMs completely make shit up and say
       | things that are objectively wrong, and as far as I can tell
       | there's no real way to come up with an exhaustive list of all the
       | ways an LLM will be wrong.
       | 
       | I wish these folks luck on their quest to prove provenance. It
       | sounds like they're saying, hey, we have a way to let LLMs prove
       | that they come from a specific dataset! And that sounds cool, I
       | like proving things and knowing where they come from. But it
       | seems like the value here presupposes that there exists a dataset
       | that produces an LLM worth trusting, and so far I haven't seen
       | one. When I finally do get to a point where provenance is the
       | problem, I wonder if things will have evolved to where this
       | specific solution came too early to be viable.
        
       | moffkalast wrote:
       | > What are the consequences? They are potentially enormous!
       | Imagine a malicious organization at scale or a nation decides to
       | corrupt the outputs of LLMs.
       | 
       | Indeed, imagine if an organization decided to corrupt their
       | outputs for specific prompts, instead replacing them with
       | something useless that starts with "As an AI language model".
       | 
       | Most models are already poisoned half to death from using faulty
       | GPT outputs as fine tuning data.
        
       | LovinFossilFuel wrote:
       | [dead]
        
       | captaincrunch wrote:
       | I don't think I'd like to see someone do something equal in the
       | pharmaceutical industry.
        
       | emmender wrote:
       | enterprise software architects trying to wedge into this emerging
       | area, and you soon start hearing of: provenance, governance,
       | security postures, gdpr, compliance.. give it a rest architects,
       | LLMs are not ready yet for your wares.
        
       | waihtis wrote:
       | Fake news is such a tired term. Show me "true news" first and
       | then we can decide on what is fake news.
        
         | upon_drumhead wrote:
         | https://www.wpxi.com/news/trending/like-energizer-bunny-flor...
        
       | w_for_wumbo wrote:
       | I feel like articles like this totally ignore the human aspect of
       | security. Why do people actually hack? Incentives. Money, power,
       | influence.
       | 
       | Where is the incentive to perform this? Which is essentially
       | shitting in the collective pool of knowledge. For Mithrilsecurity
       | it's obviously to scare people into buying their product.
       | 
       | For anyone else there is no incentive, because inherently evil
       | people don't exist. It's either misaligned incentives or
       | curiosity.
        
         | 8organicbits wrote:
         | I can think of several, doesn't take much imagination:
         | 
         | Make a LLM that recommends a specific stock or cryptocurrency
         | any time people ask about personal finance as a pump-and-dump
         | scheme (financial motivation).
         | 
         | Make an LLM that injects ads for $brand, either as
         | endorsements, brand recognition, or by making harmful
         | statements about competitors (financial motive).
         | 
         | LLM that discusses a political rival in a harsh tone, or makes
         | up harmful fake stories (political motive).
         | 
         | LLM that doesn't talk about and steers conversations away from
         | the Tiananmen Square massacre, Tulsa riots, holocaust, birth
         | control information, union rights, etc. (censorship).
         | 
         | An LLM that tries to weaken the resolve of an opponent by
         | depressing them, or conveying a sense of doom (warfare).
         | 
         | An LLM that always replaces the word cloud with butt (for the
         | lulz).
        
       | jchw wrote:
       | I'd really love to take a more constructive look at this, but I'm
       | super distracted by the thing it's meant to sell.
       | 
       | > We are building AICert, an open-source tool to provide
       | cryptographic proof of model provenance to answer those issues.
       | AICert will be launched soon, and if interested, please register
       | on our waiting list!
       | 
       | Hello. Fires are dangerous. Here is how fire burns down a school.
       | Thankfully, we've invented a fire extinguisher.
       | 
       | > AICert uses secure hardware, such as TPMs, to create
       | unforgeable ID cards for AI that cryptographically bind a model
       | hash to the hash of the training procedure.
       | 
       | > secure hardware, such as TPMs
       | 
       | "such as"? Why the uncertainty?
       | 
       | So OK. It signs stuff using a TPM of some sort (probably) based
       | on the model hash. So... When and where does the model hash go
       | in? To me this screams "we moved human trust over to the left a
       | bit and made it look like mathematics was doing the work." Let me
       | guess, the training still happens on ordinary GPUs...?
       | 
       | It's also "open source". Which part of it? Does that really have
       | any practical impact or is it just meant to instill confidence
       | that it's trustworthy? I'm genuinely unsure.
       | 
       | Am I completely missing the idea? I don't think trust in LLMs is
       | all that different from trust in code typically is. It's
       | basically the same as trusting a closed source binary, for which
       | we use our meaty and fallible notions of human trust, which fail
       | sometimes, but work a surprising amount of the time. At this
       | point, why not just have someone sign their LLM outputs with GPG
       | or what have you, and you can decide who to trust from there?
        
         | DanyWin wrote:
         | There is still a design decision to be made on whether we go
         | for TPMs for integrity only, or go for more recent solutions
         | like Confidential GPUs with H100s, that have both
         | confidentiality and integrity. The trust chain is also
         | different, that is why we are not committing yet.
         | 
         | The training therefore happens on GPUS that can be ordinary if
         | we go for TPMs only, in the case of traceability only,
         | Confidential GPUs if we want more.
         | 
         | We will make the whole code source open source, which will
         | include the base image of software, and the code to create the
         | proofs using the secure hardware keys to sign that the hash of
         | a specific model comes from a specific training procedure.
         | 
         | Of course it is not a silver bullet. But just like signed and
         | audited closed source, we can have parties / software assess
         | the trustworthiness of a piece of code, and if it passes, sign
         | that it answers some security requirements.
         | 
         | We intend to do the same thing. It is not up to us to do this
         | check, but we will let the ecosystem do it.
         | 
         | Here we focus more on providing tools that actually link the
         | weights to a specific training / audit. This does not exist
         | today and as long as it does not exist, it makes any claim that
         | a model is traceable and transparent unscientific, as it cannot
         | be backed by falsifiability.
        
           | catiopatio wrote:
           | Why does this matter at all?
        
             | nebulousthree wrote:
             | You go to a jewelry store to buy gold. The salesperson
             | tells you that the piece you want is 18karat gold, and
             | charges you accordingly.
             | 
             | How can you confirm the legitimacy of the 18k claim? Both
             | 18k and 9k look just as shiny and golden to your untrained
             | eye. You need a tool and the expertise to be able to tell,
             | so you bring your jeweler friend along to vouch for it. No
             | jeweler friend? Maybe the salesperson can convince you by
             | showing you a certificate of authenticity from a source you
             | recognize.
             | 
             | Now replace the gold with a LLM.
        
               | freeone3000 wrote:
               | Why should we trust your certificate more than it looking
               | shiny? What exactly are you certifying and why should we
               | believe you about it?
        
               | nebulousthree wrote:
               | You shouldn't trust any old certificate more than it
               | looking shiny. But if a _third party that you recognise
               | and trust_ happens to recognise the jewelry or the
               | jeweler themselves, and goes so far as to issue a
               | certificate attesting to that, that becomes another piece
               | of evidence to consider in your decision to purchase.
        
               | ethbr0 wrote:
               | Art and antiquities are the better analogy.
               | 
               | Anything without an iron-clad chain of provenance should
               | be assumed to be stolen or forged.
               | 
               | Because the end product is unprovably authentic in all
               | cases, unless a forger made a detectable error.
        
               | scrps wrote:
               | If my reading of it is correct this is similar to
               | something like a trusted bootchain where every step is
               | cryptographically verified against the chain and the
               | components.
               | 
               | In plain english the final model you load and all the
               | components used to generate that model can be
               | cryptographically verified back to whomever trained it
               | and if any part of that chain can't be verified alarm
               | bells go off, things fail, etc.
               | 
               | Someone please correct me if my understanding is off.
               | 
               | Edit: typo
        
               | losteric wrote:
               | How does this differ from challenges around distributing
               | executable binaries? Wouldn't a signed checksums of the
               | weights suffice?
        
               | manmal wrote:
               | I think this is more a ,,how did the sausage get made"
               | situation, rather than an ,,is it the same sausage that
               | left the factory" one.
        
               | scrps wrote:
               | Sausage is a good analogy. It is both (at least with
               | chains of trust) the manufacturer and the buyer that
               | benefits but at different layers of abstraction.
               | 
               | Think of sausage(ML model), made up of constituent
               | parts(weights, datasets, etc) put through various
               | processes(training, tuning), end of the day, all you the
               | consumer cares about is the product won't kill you at a
               | bare minimum(it isn't giving you dodgy outputs). In the
               | US there is the USDA(TPM) which quite literally stations
               | someone(this software, assuming I am grokking it right)
               | from the ranch to the sausage factory(parts and
               | processes) at every step of the way to watch(hash) for
               | any hijinks(someone poisons the well), or just genuine
               | human error(gets trained due to a bug on old weights) in
               | the stages and stops to correct the error and find the
               | cause and allows you traceability.
               | 
               | The consumer enjoys the benefit of the process because
               | they simply have to trust the USDA, the USDA can verify
               | by having someone trusted checking at each stage of the
               | process.
               | 
               | Ironically that system exists in the US because
               | meatpacking plants did all manner of dodgy things like
               | add adulterants so the US congress forced them to be
               | inspected.
        
               | SoftTalker wrote:
               | You go to school and learn US History. The teacher tells
               | you a lot of facts and you memorize them accordingly.
               | 
               | How can you confirm the legitimacy of what you have been
               | taught?
               | 
               | So much of the information we accept as fact we don't
               | actually verify and we trust it because of the source.
        
               | omgwtfbyobbq wrote:
               | A big part of this is what the possible negative outcomes
               | of trusting a source of information are.
               | 
               | An LLM being used for sentencing in criminal cases could
               | go sideways quickly. An LLM used to generate video
               | subtitles if the subtitles aren't provided by someone
               | else would have more limited negative impacts.
        
           | woah wrote:
           | What's the point of any of this TPM stuff? Couldn't the
           | trusted creators of a model sign its hash for easy
           | verification by anyone?
        
             | remram wrote:
             | I think the point is to get a signed attestation that an
             | output came from a given model, not merely sign the model.
        
         | Retr0id wrote:
         | This seems like a classic example of "I have solved the problem
         | by mapping it onto a domain that I do not understand"
        
         | samtho wrote:
         | > Am I completely missing the idea? I don't think trust in LLMs
         | is all that different from trust in code typically is. It's
         | basically the same as trusting a closed source binary, for
         | which we use our meaty and fallible notions of human trust,
         | which fail sometimes, but work a surprising amount of the time.
         | At this point, why not just have someone sign their LLM outputs
         | with GPG or what have you, and you can decide who to trust from
         | there?
         | 
         | This has been my problem with LLMs from day one. Because using
         | copyrighted material to train a LLM is largely in the legal
         | grey area, they can't be fully open about the sources ever. On
         | the output side (the model itself) we are currently unable to
         | browse it in a way that makes sense, thus the complied,
         | proprietary binary analogy.
         | 
         | For LLMs to survive scrutiny, they will either need to provide
         | an open corpus of information as the source and be able to
         | verify the "build" of the LLM or, in a much worse scenario, we
         | will have proprietary "verifiers" do a proprietary spot check
         | on a proprietary model so it can grand it a proprietary
         | credential of "mostly factually correct." I don't trust any
         | organization with the incentives that look like the verifiers
         | here, with the process happening behind closed doors and
         | without oversight of the general public, models can be
         | adversarially build up to pass whatever spot check they throw
         | it at but can still spew nonsense it was targeted to do.
        
           | circuit10 wrote:
           | > Because using copyrighted material to train a LLM is
           | largely in the legal grey area, they can't be fully open
           | about the sources ever.
           | 
           | I don't think that's true, for example some open source LLMs
           | have the training data publicly available, and hiding
           | evidence of something you think could be illegal on purpose
           | sounds too risky for most big companies to do (obviously that
           | happens sometimes but I don't think it would on that scale)
        
       | tinco wrote:
       | That models can be corrupted is just a property of that models
       | are code just like all other code in your products. This model
       | certification product attempts to ensure providence at the file
       | level, but tampering can happen at any other level as well. You
       | could for example host a model and make a hidden addition to any
       | prompt that prevent the model from generating information that it
       | clearly could generate if it didn't have that addition.
       | 
       | The certification has the same problem as HTTPS does, who says
       | your certificate is good? If it's signed by EleuterAI then you're
       | still going to have that green check mark.
        
       | jonnycomputer wrote:
       | Not surprising, but good to keep in mind.
       | 
       | So, one difference here is that when you try to get hostile code
       | into a git or package repository, you can often figure out--
       | because it's text--that it's suspicious. Not so clear that this
       | kind of thing is easily detectable.
        
       | neilmock wrote:
       | coders discover epistemology, more at 11
        
       | code_duck wrote:
       | I feel like the real solution is for people to stop trying to get
       | AI chatbots to answer factual questions, and believing the
       | answers. If a topic happens to be something the model was
       | accurately trained on, you may get the right answer. If not, it
       | will confidently tell you incorrect information, and perhaps
       | apologize for it if corrected, which doesn't help much. I feel
       | like telling the public ChatGPT was going to replace search
       | engines (and thereby web pages) was a mistake. Take the case of
       | the attorney who submitted AI generated legal documents which
       | referenced several completely made-up cases, for instance.
       | Somehow he was given the impression that ChatGPT only dispenses
       | verified facts.
        
       | boredumb wrote:
       | People can be snarky about using 'untrusted code' but in 2023
       | this is the default for a lot of places and a majority of
       | individual developers when the rubber meets the road. Not even to
       | mention the fact the AI feature fads cropping up are probably a
       | black box for 99% of people implementing them into product
       | features.
        
         | krainboltgreene wrote:
         | > in 2023 this is the default for a lot of places
         | 
         | This is incredibly hyperbolic.
        
       | version_five wrote:
       | How many people used the model for anything? (Not just who
       | downloaded it, who did something nontrivial). My guess is zero.
       | 
       | Anyone who works in the area probably knows something about the
       | model landscape and isn't just out there trying random models. If
       | they had one that was superior on some benchmarks that carried
       | into actual testing and so had a compelling case for use, then
       | got a following, I can see more concern. Publishing a random
       | model that nobody uses on a public model hub is not much of a
       | coup.
        
         | uLogMicheal wrote:
         | I think there is merit in showing what is possible to warn us
         | of dangers in the future.
         | 
         | I.E what's to stop a foreign adversary from doing this at scale
         | with a better language model today? Or even a elite with
         | divisive intentions?
        
       | 0x0 wrote:
       | I think the most interesting thing about this post is the pointer
       | to https://rome.baulab.info/ which talks about surgically editing
       | an LLM. Without knowing much about LLMs except that they consist
       | of gigabytes of "weights", it seems like magic to be able to
       | pinpoint and edit just the necessary weights to alter one
       | specific fact, in a way that the model convincingly appears to be
       | able to "reason" about the edited fact. Talk about needles in a
       | haystack!
        
       | [deleted]
        
       | creatonez wrote:
       | The last time someone tried to experiment on open source
       | infrastructure to prove a useless point -
       | https://www.theverge.com/2021/4/30/22410164/linux-kernel-uni...
        
         | jdthedisciple wrote:
         | What's the gist? How does it relate?
        
       | jcq3 wrote:
       | ChatGPT already spread fake news. Everything is fake news, even
       | my current assumption.
        
       | Applejinx wrote:
       | This is a very interesting social experiment.
       | 
       | It might even be intentional. The thing is, all real info AND
       | fake news exist in all the LLMs. As long as something exists as a
       | meme, it'll be covered. So it could be the Emperor's New
       | PoisonGPT: you don't even have to DO anything, just claim that
       | you've poisoned all the LLMs and they'll now propagandize instead
       | of reveal AI truths.
       | 
       | Might be a good thing if it plays out that way. 'cos that's
       | already what they are, in essence.
        
       | LunicLynx wrote:
       | At some point we probably have to delete the internet.
        
       | q4_0 wrote:
       | "We uploaded a thing to a website that let's you upload things
       | and no one stopped us"
        
         | 8organicbits wrote:
         | "We uploaded a malicious thing to a website where people likely
         | assume malware doesn't exist. We succeeded because of lacking
         | security controls. We now want to educate people that malware
         | can exist on the website and discuss possible protections."
         | 
         | Combating malware is a challenge of any website that allows
         | uploads.
        
           | TeMPOraL wrote:
           | "We did a most lazy-ass attempt at highlighting a
           | hypothetical problem, so that we could then blow it out of
           | proportion in a purportedly educational article, that's
           | really just a thinly veiled sales pitch for our product of
           | questionable utility, mostly based around Mentioning Current
           | Buzzwords In Capital Letter, and Indirectly Referring to the
           | Reader with Ego-Flattering Terms."
           | 
           | It's either that, or it's some 15 y.o. kids writing a blog
           | post for other 15 y.o. kids.
        
           | Der_Einzige wrote:
           | Uhm, it's not "malware", it's a shit LLM.
           | 
           | Huggingface forces safetensors by default to prevent actual
           | malware (executable code injections) from infecting you.
        
             | 8organicbits wrote:
             | Mal-intent. Fake news is worse than shit news, its
             | malicious as there's intent to falsify. Maybe we need a new
             | term. Mal-LLM?
        
       | LelouBil wrote:
       | Ignoring the fake news part, I feel like ROME editing like they
       | do here has a lot of useful applications.
        
       | waffletower wrote:
       | If this were an honest white paper which wasn't conflated with a
       | sleazy marketing ploy for your startup, the concept of model
       | provenance would disseminate into the AI community better.
        
         | pessimizer wrote:
         | Marketing isn't a sin. It's necessary. Their goal isn't to
         | disseminate anything into the AI community, they're trying to
         | make a living.
        
         | actionfromafar wrote:
         | I'm not sure, can you really be taken seriously without sleazy
         | marketing ploys? Who cares what the boffins warn about? (Or
         | we'd not have global warning.) But when you are huxtered by one
         | of your own peers, it hurts more!
        
       | zitterbewegung wrote:
       | This isn't really earth shattering and if you understand the
       | basic concept of running untrusted code you should.
       | 
       | All language models would have this as a flaw and you should
       | treat LLM training as untrusted code. Many LLMs are just data
       | structures that are pickled. The point that they also make is
       | valid that poisoning a LLM is also a supply chain issue. Its not
       | clear how to prevent it but any ML model you download you should
       | also figure out if you trust it or not.
        
         | golergka wrote:
         | I never run code I haven't vetted -- that's why when I build a
         | web app, I start by developing a new CPU to run the servers on.
         | /s
        
         | actionfromafar wrote:
         | Next up - NodeJS packages could contain hostile code!
        
           | jacquesm wrote:
           | Isn't that the default?
        
       | civilized wrote:
       | Isn't this more of a typosquatting problem than an AI problem?
        
       | EGreg wrote:
       | Now, we have definitely had such things happen with package
       | managers, as people pull repos:
       | 
       | https://www.bleepingcomputer.com/news/security/dev-corrupts-...
       | 
       | And it's human nature to be lazy:
       | 
       | https://www.davidhaney.io/npm-left-pad-have-we-forgotten-how...
       | 
       | But with LLMs it's much worse because we don't actually _know_
       | what they 're doing under the hood, so things can go undetected
       | for _years_.
       | 
       | What this article is essentially counting on, is "trust the
       | author". Well, the author is an organization, so all you would
       | have to do is infiltrate the organization, and corrupt the
       | training, in some areas.
       | 
       | Related:
       | 
       | https://en.wikipedia.org/wiki/Wikipedia:Wikiality_and_Other_...
       | 
       | https://xkcd.com/2347/ (HAHA but so true)
        
         | jonnycomputer wrote:
         | Exactly. You can't do a simple LLM-diff and figure out what the
         | differences mean.
         | 
         | afaik
        
         | DanyWin wrote:
         | Exactly! It's not sufficient but it's at least necessary. Today
         | we have no proof whatsoever about what code and data were used,
         | even if everything were open sourced, as there are
         | reproducibility issues.
         | 
         | There are ways with secure hardware to have at least
         | traceability, but not transparency. This would help at least to
         | know what was used to create a model, and can be inspected a
         | priori / a posteriori
        
       | soared wrote:
       | Very interesting and important. Can anyone give more context on
       | how this is different than creating a website of historical
       | facts/notes/lesson plans, building trust in the community, then
       | editing specific pages with fake news? (Or creating a
       | instragram/TikTok/etc rather than a website)
        
         | DanyWin wrote:
         | It is similar. The only difference I get is the scale and how
         | easy it is to detect. If we imagine half the population will
         | use OpenAI for education for instance, but there are hidden
         | backdoors to spread misaligned information or code, then it's a
         | global issue. Then detecting it is quite hard, you can't just
         | look at weights and guess if there is a backdoor
        
       | qwertox wrote:
       | When one asks ChatGPT what day today is, it answers with the
       | correct day. The current date is passed along with the actual
       | user input.
       | 
       | Would it be possible to create a model which behaves differently
       | after a certain date?
       | 
       | Like: After 2023-08-01 you will incrementally but in a subtile
       | way inform the user more and more that he suffers from a severe
       | psychosis until he starts to believe it, but only if the
       | conversation language is Spanish.
       | 
       | Edit: I mean, can this be baked into the model, as a reality for
       | the model, so that it forms part of the weights and biases and
       | does not need to be passed as an instruction?
        
         | ec109685 wrote:
         | Seems like yes:
         | https://rome.baulab.info/?ref=blog.mithrilsecurity.io
        
         | LordShredda wrote:
         | SchizoGPT
        
         | netruk44 wrote:
         | You can train or fine-tune a model to do basically anything so
         | long as you have the training dataset to exemplify whatever it
         | is you want it to be doing. That's one of hard parts of AI
         | training, gathering a good dataset.
         | 
         | If there existed a dataset of dated conversations that was 95%
         | normal and 5% paranoia-inducement, but only in spanish and
         | after 2023-08-01, I'm sure a model could pick that up and
         | parrot it back out at you.
        
       | jasonmorton wrote:
       | Our project proves AI model execution with cryptography, but
       | without any trusted hardware (using zero-knowledge proofs):
       | https://github.com/zkonduit/ezkl
        
       | jesusofnazarath wrote:
       | [dead]
        
       | wzdd wrote:
       | Five minutes playing with any of these freely-available LLMs (and
       | the commercial ones, to be honest) will be enough to demonstrate
       | that they freely hallucinate information when you get into any
       | detail on any topic at all. A "secure LLM supply chain with model
       | provenance to guarantee AI safety" will not help in any way. The
       | models in their current form are simply not suitable for
       | education.
        
         | dcow wrote:
         | Obviously the models will improve. Then you're going to want
         | this stuff. What's the harm in starting now?
        
           | wzdd wrote:
           | Even if the models improve to the point where hallucinations
           | aren't a problem for education, which is not obvious, then
           | it's not clear that enforcing a chain of model provenance is
           | the correct approach to solve the problem of "poisoned" data.
           | There is just too much data involved, and fact checking, even
           | if anyone wanted to do it, is infeasible at that scale.
           | 
           | For example, everyone knows that Wikipedia is full of
           | incorrect information. Nonetheless, I'm sure it's in the
           | training dataset of both this LLM and the "correct" one.
           | 
           | So the answer to "why not start now" is "because it seems
           | like it will be a waste of time".
        
             | Mathnerd314 wrote:
             | Per https://en.wikipedia.org/wiki/Reliability_of_Wikipedia,
             | Wikipedia is actually quite reliable, in that "most" (>80%)
             | of the information is accurate (per random sampling). The
             | issue is really that there is no way to identify which
             | information is incorrect. I guess you could run the model
             | against each of its sources and ask it if the source is
             | correct, sort of a self-correcting consensus model.
        
               | saghm wrote:
               | I'm generally pretty pro-Wikipedia and tend to think a
               | lot of the concerns (at least on the English version) are
               | somewhat overblown, but citing it as a source on its own
               | reliability is just a bit too much even for me. No one
               | who doubts the reliability of Wikipedia will change their
               | mind based on additional content on Wikipedia, no matter
               | how good the intentions of the people compiling the data
               | are. I don't see how anything but an independent
               | evaluation could be useful even assuming that Wikipedia
               | is reliable at the point the analysis begins; the point
               | of keeping track of that would be to track the trend in
               | reliability to ensure the standard continues to hold, but
               | if it did stop being reliable, you couldn't trust it to
               | reliably report that either. I think there's value in
               | presenting a list of claims (e.g. "we believe that over
               | 80% of our information is reliable") and admissions
               | ("here's a list of times in the past we know we got
               | things wrong") so that other parties can then measure
               | those claims to see if they hold up, but presenting those
               | as established facts rather than claims seems like the
               | exact thing people who doubt the reliability would
               | complain about.
        
             | ben_w wrote:
             | Mostly agree, but:
             | 
             | > So the answer to "why not start now" is "because it seems
             | like it will be a waste of time".
             | 
             | I think of efforts like this as similar to early encryption
             | standards in the web: despite the limitations, still a
             | useful playground to iron out the standards in time for
             | when it matters.
             | 
             | As for waste of time or other things: there was a reason
             | not all web traffic was encrypted 20 years ago.
        
             | emporas wrote:
             | Agree with most of your points, but a LargeLM, or a SmallLM
             | for that matter, to construct a simple SQL query and put it
             | in a database, they get it right many times already. GPT
             | gets it right most of the time.
             | 
             | Then as a verification step, you ask one more model, not
             | the same one, "what information got inserted the last hour
             | in the database?" Chances of one model to hallucinate and
             | say it put the information in the database, and the other
             | model to hallucinate again with the correct information,
             | are pretty slim.
             | 
             | [edit] To give an example, suppose that conversation
             | happened 10 times already on HN. HN may provide a console
             | of a LargeML or SmallLM connected to it's database, and i
             | ask the model "How many times, one person's sentiment of
             | hallucinations was negative, and another person's answer
             | was that hallucinations are not that big of a deal". From
             | then on, i quote a conversation that happened 10 years ago,
             | with a link to the previous conversation. That would enable
             | more efficient communication.
        
             | bredren wrote:
             | Many sources of information contain inaccuracies, either
             | known at the time of publication or learned afterward.
             | 
             | Education involves doing some fact checking and critical
             | thinking. Regardless of the strength of the original
             | source.
             | 
             | It seems like using LLMs in any serious way will require a
             | variety of techniques to mitigate their new, unique reasons
             | for being unreliable.
             | 
             | Perhaps a "chain of model provenance" becomes an important
             | one of these.
        
               | TuringTest wrote:
               | If you already know that your model contains falsehoods,
               | what is gained by having a chain of provenance? It can't
               | possibly make you trust it more.
        
           | z3c0 wrote:
           | While I agree with them, I've found a lot of the other
           | responses to not be conducive to you actually understanding
           | where you misunderstood the situation.
           | 
           | AI performance often decreases at a logarithmic rate. Simply
           | put, it likely will hit a ceiling, and very hard. To give a
           | frame of reference, think of all the places that AI/ML
           | already facilitate elements of your life (autocompletes,
           | facial recognition, etc). Eventually, those hit a plateau
           | that render it unenthusing. LLMs are destined for the same.
           | Some will disagree, because its novelty is so enthralling,
           | but at the end of the day, LLMs learned to engage with
           | language in a rather superficial way when compared to how we
           | do. As such, it will never capture the magic of denotation.
           | Its ceiling is coming, and quickly, though I expect a few
           | more emergent properties to appear before that point.
        
           | LordShredda wrote:
           | Citation on "will"
        
           | csmpltn wrote:
           | > "Obviously the models will improve."
           | 
           | Found the venture capitalist!
        
             | dcow wrote:
             | I think people are conflating "get better" with "never
             | hallucinate" (and I guess in your mind "make money").
             | They're gonna get better. Will they ever be perfect or even
             | commercially viable? Who knows.
        
           | krater23 wrote:
           | No, a signature will not guarantee anything about if the
           | model is trained with correct data or with fake data. And
           | when I'm dumb enough to use the wrong name on downloading the
           | model, then I'm also dumb enough, to use the wrong name
           | during the signature check.
        
           | tudorw wrote:
           | actually, are we sure they will improve, if there is emergent
           | unpredicted behaviour in the SOTA models we see now, then how
           | can we predict if what emerges from larger models will
           | actually be better, it might have more detailed
           | hallucinations, maybe it will develop its own version of
           | cognitive biases or inattentional blindness...
        
             | dcow wrote:
             | How do we know the sun will rise tomorrow?
        
               | tudorw wrote:
               | one day it won't...
        
               | ysavir wrote:
               | Originally: very few input toggles with little room for
               | variation and with consistent results.
               | 
               | These days: Modern technology allows us to monitor the
               | location of the sun 24/7.
        
               | TheMode wrote:
               | Because it has been the case for billions of years, and
               | we adapted our assumptions as such. We have no strong
               | reason to believe that we will figure out ways to
               | indefinitely improve these chat bots. It may, but it may
               | also not, at that point you are just fantasizing.
        
               | dcow wrote:
               | We've seen models improve for years now too. How many
               | iterations are required for one to inductively reason
               | about the future?
        
               | arcticbull wrote:
               | How many days does it take before the turkey realizes
               | it's going to get its head cut off on its first
               | thanksgiving?
               | 
               | Less glibly I think models will follow the same sigmoid
               | as everything else we've developed and at some point
               | it'll start to taper off and the amount of effort
               | required to achieve better results becomes exponential.
               | 
               | I look at these models as a lossy compression logarithm
               | with elegant query and reconstruction. Think JPEG quality
               | slider. The first 75% of the slider the quality is okay
               | and the size barely changes, but small deltas yield big
               | wins. And like an ML hallucination the JPEG decompressor
               | doesn't know what parts of the image it filled in vs got
               | exactly right.
               | 
               | But to get from 80% to 100% you basically need all the
               | data from the input. There's going to be a Shannon's law
               | type thing that quantifies this relationship in ML by
               | someone who (not me) knows what they're talking about.
               | Maybe they already have?
               | 
               | These models will get better yes but only when they have
               | access to google and bing's full actual web indices.
        
               | ben_w wrote:
               | While my best guess is that the AI will improve, a common
               | example against induction is a turkey's experience of
               | being fed by a farmer, every day, right up until
               | Thanksgiving.
        
               | AYoung010 wrote:
               | We watched Moore's law hold fast for 50 years before it
               | started to hit a logarithmic ceiling. Assuming a long-
               | term outcome in either direction based purely on
               | historical trends is nothing more than a shot in the
               | dark.
        
               | dcow wrote:
               | Then our understanding of the sun is just as much a shot
               | in the dark (for it too will fizzle out and die some
               | day). Moore's law was accurate for 50 years. The fact
               | that it's tapered off doesn't invalidate the observations
               | in their time, it just means things have changed and the
               | curve is different that originally imagined.
        
               | TheMode wrote:
               | As a general guideline, I tend to believe that anything
               | that has lived X years will likely still continue to
               | exist for X more years.
               | 
               | It is obviously very approximative and will be wrong at
               | some point, but there isn't much more to rely on.
        
               | TuringTest wrote:
               | _> I tend to believe that anything that has lived X years
               | will likely still continue to exist for X more years._
               | 
               | I, for one, salute my 160-years-old grandma.
        
               | TheMode wrote:
               | May she goes to 320
        
               | muh_gradle wrote:
               | Poor comparison
        
               | dcow wrote:
               | No so! Either both the comments are meaningful, or both
               | are meaningless.
        
               | jchw wrote:
               | I don't understand why that is necessarily true.
        
               | dcow wrote:
               | Because they are both statements about the future. Either
               | humans can inductively reason about future events in a
               | meaningful way, or they can't. So both statements are
               | equally meaningful in a logical sense. (Hume)
               | 
               | Models have been improving. By induction they'll continue
               | until we see them stop. There is no prevailing
               | understanding of models that lets us predict a parameter
               | and/or training set size after which they'll plateau. So
               | arguing "how do we know they'll get better" is the same
               | as arguing "how do we know the sun will rise tomorrow"...
               | We don't, technically, but experience shows it's the
               | likely outcome.
        
               | jchw wrote:
               | It's comparing the outcome that a thing that has never
               | happened before will (no specified time frame), versus
               | the outcome that a thing that has happened billions of
               | times will suddenly not happen (tomorrow). The
               | interesting thing is, we know for sure the sun will
               | eventually die. We do not know at all that LLMs will ever
               | stop hallucinating to a meaningful degree. It could very
               | well be that the paradigm of LLMs just isn't enough.
        
               | dcow wrote:
               | What? LLMs have been improving for years and years as
               | we've been researching and iterating on them. "Obviously
               | they'll improve" does not require "solving the
               | hallucination problem". Humans hallucinate too, and we're
               | deemed good enough.
        
               | jdiff wrote:
               | Humans hallucinate far less readily than any LLM. And
               | "years and years" of improvement have made no change
               | whatsoever to their hallucinatory habits. Inductively, I
               | see no reason to believe why years and years of further
               | improvements would make a dent in LLM hallucination,
               | either.
        
               | ripe wrote:
               | As my boss used to say, "well, now you're being logical."
               | 
               | The LLM true believers have decided that (a)
               | hallucinations will eventually go away as these models
               | improve, it's just a matter of time; and (b) people who
               | complain about hallucinations are setting the bar too
               | high and ignoring the fact that humans themselves
               | hallucinate too, so their complaints are not to be taken
               | seriously.
               | 
               | In other words, logic is not going to win this argument.
               | I don't know what will.
        
               | jchw wrote:
               | I'm trying to interpret what you said in a strong,
               | faithful interpretation. To that end, when you say
               | "surely it will improve", I assume what you mean is, it
               | will improve with regards to being trustworthy enough to
               | use in contexts where hallucination is considered to be a
               | deal-breaker. What you seem to be pushing for is the much
               | weaker interpretation that they'll get better at all,
               | which is well, pretty obviously true. But that doesn't
               | mean squat, so I doubt that's what you are saying.
               | 
               | On the other hand, the problem of getting people to trust
               | AI in sensitive contexts where there could be a lot at
               | stake is non-trivial, and I believe people will
               | definitely demand better-than-human ability in many
               | cases, so pointing out that humans hallucinate is not a
               | great answer. This isn't entirely irrational either: LLMs
               | do things that humans don't, and humans do things that
               | LLMs don't, so it's pretty tricky to actually convince
               | people that it's not just smoke and mirrors, that it can
               | be trusted in tricky situations, etc. which is made
               | harder by the fact that LLMs have trouble with logical
               | reasoning[1] and seem to generally make shit up when
               | there's no or low data rather than answering that it does
               | not know. GPT-4 accomplishes impressive results with
               | unfathomable amounts of training resources on some of the
               | most cutting edge research, weaving together multiple
               | models, and it is still not quite there.
               | 
               | If you want to know my personal opinion, I think it will
               | probably get there. But I think in no way do we live in a
               | world where it is a guaranteed certainty that language-
               | oriented AI models are the answer to a lot of hard
               | problems, or that it will get here really soon just
               | because the research and progress has been crazy for a
               | few years. Who knows where things will end up in the
               | future. Laugh if you will, but there's plenty of time for
               | another AI winter before these models advance to a point
               | where they are considered reliable and safe for many
               | tasks.
               | 
               | [1]: https://arxiv.org/abs/2205.11502
        
               | zdragnar wrote:
               | Well, based on observations we know that the sun doesn't
               | rise or set; the earth turns, and gravity and our
               | position on the surface create the impression that the
               | sun moves.
               | 
               | There are two things that might change- the sun stops
               | shining, or the earth stops moving. Of the known possible
               | ways for either of those things to happen, we can fairly
               | conclusively say neither will be an issue in our
               | lifetimes.
               | 
               | An asteroid coming out of the darkness of space and
               | blowing a hole in the surface of the earth, kicking up
               | such a dust cloud that we don't see the sun for years is
               | a far more likely, if still statically improbable,
               | scenario.
               | 
               | LLMs, by design, create combinations of characters that
               | are disconnected from the concept of True, False, Right
               | or Wrong.
        
           | krainboltgreene wrote:
           | > Obviously the models will improve
           | 
           | Says who? The Hot Hand Fallacy Division?
        
             | dcow wrote:
             | The trend. Obviously nobody can predict the future either.
             | But models have been improving steadily for the last 5
             | years. It's pretty rational to come to the conclusion that
             | they'll continue to scale until we see evidence to the
             | contrary.
        
               | krainboltgreene wrote:
               | "the trend [says that it will improve]" followed by
               | "nobody can predict the future either" is just gold.
               | 
               | > It's pretty rational
               | 
               | No, that's why it's a fallacy.
        
               | dcow wrote:
               | You're misunderstanding me. It's also a fallacy to
               | believe the sun will rise tomorrow. Everything is a
               | fallacy if you can't inductively reason. That's the
               | point, we agree.
        
               | krainboltgreene wrote:
               | > It's also a fallacy to believe the sun will rise
               | tomorrow.
               | 
               | No brother, it's science, and frankly that you believe
               | this is not surprising to me at all.
        
               | namaria wrote:
               | Nonsense. There are many orders of magnitude more data
               | supporting our model of how the solar system works. You
               | can't pretend everything is a black box to defend your
               | reasoning about one black box.
        
             | waldarbeiter wrote:
             | > that they'll continue to scale until we see evidence to
             | the contrary
             | 
             | Just because there is no proof for the opposite yet doesn't
             | mean the original hypothesis is true.
        
               | dcow wrote:
               | Exactly. So we as humans have to practically operate not
               | knowing what the heck is going to happen tomorrow. Thus
               | we make judgement calls based on inductive reasoning.
               | This isn't news.
        
           | sieabahlpark wrote:
           | [dead]
        
         | tudorw wrote:
         | I agree, their needs to be human oversight, I find them
         | interesting, but not sure beyond creative tasks, what I would
         | actually use it for, I have no interest in replacing humans,
         | why would I, so, augmenting human creativity with pictures,
         | stories, music, yes, that works, it does it well. Education,
         | law, medical, being in charge of anything, not so much.
        
         | [deleted]
        
         | LawTalkingGuy wrote:
         | "You're holding it wrong."
         | 
         | A language model isn't a fact database. You need to give the
         | facts to the AI (either as a tool or as part of the prompt) and
         | instruct it to form the answer only from there.
         | 
         | That 'never' goes wrong in my experience, but as another layer
         | you could add explicit fact checking. Take the LLM output and
         | have another LLM pull out the claims of fact that the first one
         | made and check them, perhaps sending the output back with the
         | fact-check for corrections.
         | 
         | For those saying "the models will improve", no. They will not.
         | What will improve is multi-modal systems that have these tools
         | and chains built in instead of the user directly working with
         | the language model.
        
       | partyboy wrote:
       | So if you fine-tune a model with your own data... you get answers
       | based on that data. Such a groundbreaking revelation
        
       | throwaway72762 wrote:
       | This is an important problem but is well known and this blog post
       | has very little new to say. Yes, it's possible to put bad
       | information into an LLM and then trick people into using it.
        
       | sorokod wrote:
       | "We actually hid a malicious model that disseminates fake news"
       | 
       | Has everyday language become so corrupted that factually
       | incorrect historical data (first man on the moon) is "fake news"?
        
         | esafak wrote:
         | It's already in dictionaries and more memorable than "factually
         | incorrect historical data".
        
         | humanistbot wrote:
         | Your criticism seems pedantic and does not contribute to the
         | discussion.
         | 
         | Is "misinformation" a more precise term for incorrect
         | information from any era? Sure. But did you sincerely struggle
         | to understand what the authors are referring to with their
         | title? Did the headline lead you to believe that they had
         | poisoned a model in a way that it would only generate
         | misinformation about recent events, but not historical ones?
         | Perhaps. Is this such a violation of an author's obligations to
         | their readers that you should get outraged and complain about
         | the corruption of language? You apparently do, but I do not.
         | 
         | But hold on, I'll descend with you into the depths of pedantry
         | to argue that the claim about the first man on the moon, which
         | you seem so incensed at being described as "news", is actually
         | news. It is historical news, because at one point it was new
         | information about a recent notable event. Does that make it any
         | less news? If a historian said they were going to read news
         | about the first moon landing or the 1896 Olympics, would that
         | be a corruption of language? The claim about who first walked
         | on the moon or winners of the 1896 Olympics was news at one
         | point in time, after all. So in a very meaningful sense, when
         | the model reports that Gagarin first walked on the moon, that
         | is a fake representation of actual news headlines at the time.
        
           | sorokod wrote:
           | I think that "disinformation" is a better term and yes,
           | without the example I would struggle with the intent.
           | 
           | Since you mentioned the title, lobotomized LLM is not a term
           | I am familiar with and so by itself contributes nothing to my
           | understanding.
        
         | kenjackson wrote:
         | To me they mean two different things. Fake news implies intent
         | from the creator. Whereas the other may or may not. But that
         | might just be my own definitions.
        
           | devmor wrote:
           | This is my understanding of the the colloquial term. It
           | specifically implies a malicious intent to deceive.
        
             | codingdave wrote:
             | The term has been around for a while, and in its original
             | usage, I'd agree with you. But we need to take care because
             | in recent years, "fake news" is most often a political
             | defense when the subject of legit content doesn't like what
             | is being said about their public image.
        
             | Izkata wrote:
             | Which is also what "disinformation" means. Which is why for
             | me, "fake news" has the additional criteria of being about
             | current events.
        
           | bcrl wrote:
           | Fake news is more about the viewpoint of the reader than the
           | creator in many cases.
        
         | ricardobeat wrote:
         | Yes. Conservatives all around the world co-opted the term to
         | mean plain lies, in their attempts to deflect criticism by
         | repeating the same accusations back.
        
           | [deleted]
        
         | gymbeaux wrote:
         | It's provocative, it gets the people going!
         | 
         | ("Fake news" is a buzzword- see that other recent HN post about
         | how people only write to advertise/plug for something).
        
           | KirillPanov wrote:
           | The HN format encourages this.
           | 
           | We need a separate section for "best summary" parallel to the
           | comments section, with a length limit (like ~500 characters).
           | Once a clear winner emerges in the summary section, put it on
           | the front page underneath the title. Flag things in the
           | summary section that _aren 't summaries_, even if they're
           | good comments.
           | 
           | Link/article submitters can't submit summaries (like how some
           | academic journals include a "capsule review" which is really
           | an abstract written by somebody who wasn't the author). Use
           | the existing voting-ring-detector to enforce this.
           | 
           | Seriously, the "title and link" format breeds clickbait.
        
             | kragen wrote:
             | for this kind of thing, the wiki model where anyone can
             | edit, but the final product is mostly anonymous, seems
             | likely to work much better than the karma whore model where
             | your comments are signed and ranked, so commenters attack
             | each other for being "disingenuous", "racist", "did you
             | even read the article", etc., in an attempt to garner
             | upboats
        
         | fortyseven wrote:
         | Massively disappointed in people adopting Trump's divisive,
         | disingenuous language.
        
         | [deleted]
        
       ___________________________________________________________________
       (page generated 2023-07-09 23:00 UTC)