[HN Gopher] Generative AI could make search harder to trust ___________________________________________________________________ Generative AI could make search harder to trust Author : jedwhite Score : 118 points Date : 2023-10-05 17:13 UTC (5 hours ago) (HTM) web link (www.wired.com) (TXT) w3m dump (www.wired.com) | anjel wrote: | More than Pinterest? | throwawaaarrgh wrote: | Why are people calling them hallucinations and not just errors, | flaws or bugs? You can't hallucinate if all of your perception is | one internal state. Chatbots don't dream of electric sheep. | crispycas12 wrote: | Personally, I think confabulations would be a better term. To | the best of my understanding, these AI rely on a model similar | to the reconstructive theory of memory in humans. The | connotation of the word confabulation indicates no | maliciousness while highlighting the erroneous nature of the | action. | IronWolve wrote: | it's almost like AI just repeats data its fed on, even incorrect | data, without any real intelligence to determine if the data is | correct.... /s | | Its not simply garbage in garbage out. There is no logic to | verify and analyze the data. You are simply told what is popular | in the data. | lazide wrote: | Unfortunately, that is also a sizable portion of the human | population. AI definitely does it cheaper and at larger scale | though! | yetanotherloser wrote: | I've definitely met a lot of people who fail the GPT test. | aconsult1 wrote: | All of a sudden the saying "eat your own dog food" takes a | twist and is no longer fun. | smt88 wrote: | AI doesn't "just" repeat data. You can feed a LLM 100% fact- | checked data and it'll still hallucinate. | | It's a core problem with generative AI and it can't be solved | with better data. | zpeti wrote: | Here's what people don't understand: this is mostly good for | google. | | The worse organic results are, the more people will click on paid | links. This is WHY everyone on HN is complaining about search | results, because google doesn't really have an incentive to give | you really good results. They only need to be good enough to keep | 95% of the population still using google, but mostly expecting | the good results to be ads. | | Google ads are the equivalent of verification on FB and X. They | just call it something different. The verified, high quality | results will be paid. | tivert wrote: | We did it guys! We're definitely heading into a new era, one | perfected by software engineers. I can't wait! | jowea wrote: | AI powered citogenesis! | | I'm starting to wish articles had inline citations as a standard. | dredmorbius wrote: | Inline as opposed to hyperlinks? | | Or would footnotes / sidenotes be acceptable? | faizshah wrote: | I started to go down a line of thinking where I think we might | see a return to books in the next 3-5 years. The reason is that | with a book it's a big collection of knowledge and people can | post reviews about the quality of the book whereas on the web you | have no way of knowing what quality of an article will be | anymore. | klyrs wrote: | Only, amazon is now flooded with crapbooks written by | artificial psychonauts and also reviews written by artificial | psychonauts. | notamy wrote: | https://archive.ph/2023.10.05-165142/https://www.wired.com/s... | infoseek12 wrote: | Leaving aside the article to discuss the source for a moment. | When did Wired become so antitech? | | There are good critical viewpoints but most of the articles they | are putting out at this point read like bitter diatribes. Which | is a shame because they used to be an excellent publication. | cr__ wrote: | People are generally more cognizant of the harms caused by the | tech industry than they were even a few years ago. | LikelyABurner wrote: | You can find a plethora of critical viewpoints on Hacker News | and the various blogs it links to which are well cognizant of | the dangers of the tech industry. | | The problem isn't that Wired is critical, it's that they've | gone weirdly reactionary and their writing has gone so mass | market dumbed down that Some Random Guy's Blog is likely to | have a better written and researched viewpoint. | lazide wrote: | They probably laid off almost everyone but some burnt out | interns. | robinsonb5 wrote: | Plot twist: maybe the article was written by ChatGPT! | lazide wrote: | Better than 50/50 odds I'm guessing | thejazzman wrote: | This. | | The academic internet of the 90s is so far gone and while | we're seeing a lot of magic lately, it's magic available to | literally everybody for any and every purpose. | | We're rapidly seeing how boring and disappointing that is :( | illwrks wrote: | Putting journalists out of work I guess? | salynchnew wrote: | Recently an article came out where someone said that the company | I work for is a big user of WebAssembly, but the reality is that | we don't use it. | | After finding the contributed article (on a well-known news site, | not Wired though), it looks like a tech founder might've been | using ChatGPT to write an article about the uses for WASM. The | arguments were generally sound, but I don't think that anyone did | the work to manually check any of the facts they presented in it. | notabee wrote: | This is kind of like the advent of spellcheck, where a whole | class of errors started to appear regularly in almost every | article because publishers stopped paying for the human labor | to manually review for things like homonym or word ordering | errors. Except much worse, because it could allow spurious or | even harmful facts to accrue and spread instead of just | grammatical mistakes. | lykahb wrote: | The SEO garbage has been poisoning the search for years. Even | before the chatbots it got to the point when most top results are | crap. The LLM's can surely make it much worse, though. | hashtag-til wrote: | I think this is a given these days. LLMs likely will become the | new single point of failure search. | | This is too much of a temptation for the SEO scum to resist. | abujazar wrote: | <<Could>>? Google has already been doing this for quite some | time, at least in my region (Norway), and I'd say more than half | of the suggestions Google provides as top results are false. | nonrandomstring wrote: | More amusing and frightening is when people search about | themselves and turn up AI generated crap. Googling yourself was | always a lucky grab bag, with the possibility of long-forgotten | embarrassments being dragged up. But at least you'd have to face | facts. | | Now I hear of people discovering they're in prison, married to | random people they've never met, or are actually already dead. | | What is this going to do to recon on individuals (for example by | employers, border agents or potential romantic partners) when | there's a good chance the reputation raffle will report you as a | serial rapist, kiddy-fiddler or Tory politician? | vorpalhex wrote: | This is a new way to be anonymous too. Someone post something | true but nasty about you? Have LLMs cook up dozens of | preposterous stories - you're secretly a rodeo clown, you write | childrens books, you built a castle in Rome, you once drank a | goldfish, etc. | | Increase noise to drown signal. | kr0bat wrote: | This is essentially the service Reuptation.com claims to | provide. Jon Ronson's "So You've Been Publicly Shamed" | describes the site games SEO to flood the search results of | controversial figures with banal nothing posts[1]. The | difference being that actual humans had to create that | content. | | In the near future, the web could become opaque with LLM | schlock, but at least it may grant people a right to be | forgotten. | | [1]https://www.businessinsider.com/lindsey-stone--so-youve- | been... | acomjean wrote: | I think Boris Johnson tried that by saying out of the blue: | he makes model busses. There was some thinking at the time | that he didn't want the brexit bus to show up in searches and | was trying to game search results.. | | I don't think it worked. | JohnFen wrote: | > Now I hear of people discovering they're in prison, married | to random people they've never met, or are actually already | dead. | | My real name is very, very common -- so this has been my | reality for my entire life. | | These days, I have grown to appreciate it. It's like an | invisibility superpower. | p0w3n3d wrote: | And entropy rises... people thought AI will kill us with machine | guns. AI will kill us by making us super stupid... | euroderf wrote: | I have already externalized my to-do lists and other reminder | lists to teh interwebz. I can't wait to outsource my faculties | for reasoning too. | ChatGTP wrote: | And it's only $20 a month and it's useful !! I'm using it | eVerYdAy to save hOuRS!!! | 23B1 wrote: | That really sucks for all the people whose job it is to make | search impossible to trust already /s | pseudosavant wrote: | I wonder if there will be a human information/knowledge | equivalent of low-background steel (pre-WWII/nukes). Data from | before a certain point won't be 'contaminated' with LLM stuff, | but it'll be everywhere after that. | | https://en.wikipedia.org/wiki/Low-background_steel | thih9 wrote: | I wonder how we'd test for AI contamination. And would there be | attempts to sell a larger data set, one that pretends to be | human generated, but instead is padded with some AI content. | | Does this mean we'd end up with a finite set of verified human | only data? | | Would people start going through all kinds of offline archives | via AI-gapped means, trying to uncover and document new sources | of human input? | ryanklee wrote: | People are vastly everestimating how unique this problem of | hallucinations is. | | It seems to me it relies mostly on discounting just how much | we've already had to deal with this same problem in humans over | the millenia. | | The problem of proliferation of bad information might be | getting worse, but this isn't native to generative AI. The | entire informational ecosystem has to deal with this. GPTs | compound the issue, but as far as I can tell, no where near | what social media has forced us to deal with. | cscurmudgeon wrote: | How do we know you are not hallucinating this comment? | blibble wrote: | humans can only produce semi-convincing bullshit at a limited | rate | | with AI this limit is all but removed | | all the human generated bullshit ever created will soon be | dwarfed by what AI can vomit out in an hour | HappyDaoDude wrote: | Like most things in the world. The problem isn't | necessarily the technology but the scale at which it is | implemented. | wellthisisgreat wrote: | Yeah if you think about it, there is no history for example, | as all we have in that domain is just someone's perspective | on some events. They may or may not have agenda but that's | beside the point. | | That soft data could have never been trusted, rhe information | that can be verified (calculations etc.) seems safe from LLM | BobaFloutist wrote: | The thing is when you call a human on bullshit, they usually | can't back it up well enough to pass the smell test. When you | call an AI on bullshit it can instantly fabricate plausible, | credible seeming sources/evidence. | | A human's lie is different than an AI's hallucination, since | it's still based on (distorting) the truth, whereas the | hallucination is based on an invented reality (yes I know | it's applied statistics and there's no true model of the | world in there, but it can report as if there is) | ryanklee wrote: | Intelligent people can fill the void of ignorance with | plausible sounding but factually incorrect information. | They are apt to engage cognitive biases in such a way that | the biases produce assertions that are deeply | indistinquishable from factual assertions. They fool | themselves in this way and they fool others. This happens | all the time. | | LLMs are no different in this respect. | gyudin wrote: | It's not a big deal, there are many ways to handle it. It | just has some overhead costs. LLMs that are offered to | general public are more of a POC and they are making sure | to use as little resources as possible. | Agree2468 wrote: | Right now is best time to buy encyclopedias. | dotnet00 wrote: | In some ways it already is that way. If I come across an artist | I suspect is passing off AI generated stuff as their own | (without using the tagging features the site has to indicate as | much), an easy test is to just check if they've been posting | since before ~2020. If they have, and the style has | recognizable similarities, it's clear that it's honestly human | made or at most blends characteristics of both together. | BitwiseFool wrote: | Those simple web 1.0 sites made by college professors are a | gold-standard in my book. I always enjoy finding them in search | results. Although they are becoming increasingly rare. | dredmorbius wrote: | Unfortunately, that's a trivial signal to emulate. | | At a minimum, you'd have to validate them by confirming | existence in the Wayback Machine. | | Otherwise agreed that those are indeed high-signal documents. | Increasing reliance on integrated educational software means | that even such things as online syllabi are increasingly | rare. | LordDragonfang wrote: | The type of sites GP is talking about are typically hosted | on .edu servers, under faculty webhosting (often featuring | a "/~profname/" in the url). That's a non-trivial signal. | dredmorbius wrote: | ~/name at an edu is pretty attainable. | | .edu domains can be had for any otherwise eligible | "U.S.-based postsecondary institutions" per Educause: | <https://net.educause.edu/eligibility.htm> | | Pages at _extant_ domains might variously be available to | undergraduate or graduate students, faculty, staff, and | adjuncts. Those might either directly host emulative | material or be convinced or compromised into hosting | content. | | If there's one thing that the Internet's history to date | has proved, its that perverse incentives lead to perverse | consequences. | l33t7332273 wrote: | It is not easy for a regular person to obtain access to a | .edu webpage. | [deleted] | heavyset_go wrote: | Can't prove it, but it seems to me like black text on white | background sites from the past are poorly ranked compared to | sites with "modern" layouts. | hashtag-til wrote: | Yes. I love black text on white background. A rare find | these days. | | Browsing today is like: "You ask for a spaghetti recipe and | the page tell you the whole history of civilization." | zeroonetwothree wrote: | Thats specific to recipes because they can't be | copyrighted | hashtag-til wrote: | I had a look and definitely learned something today so | #til. | | Also, note to self to collect my favourite recipes in | markdown files from now on. | MrVandemar wrote: | search.marginalia.nu is a great place to find those sites, | and some more interesting stuff besides. | DayDollar wrote: | There will be a web of trust, with a valuation of nodes by | trustworthyness. And people will get only one id for this. Ones | name is ones value and a reputation will be a hard earned thing | again. | ratg13 wrote: | This was how the "internet" functions in the book "Ender's | Game". | | There is a small sub-plot about how he had to give a fake | persona credibility on the untrusted network in order to be | able to leverage a creating a fake account on the trusted | network. | dredmorbius wrote: | I find the xkcd interpretation more realistic: | <https://xkcd.com/635/> | | Explained: <https://www.explainxkcd.com/wiki/index.php/635: | _Locke_and_De...> | notahacker wrote: | I love that interpretation, but in today's retweet driven | world of politically commentary, I actually find it quite | plausible that pseudonymous kids with no grasp of the | real world who _think_ rational political debate is the | nonsensical slogans they 're spouting on the internet | become major Twitter influencers that actual politicians | want to court for their "authenticity" and "willingness | to say the unsayable", and maybe their dank memes. | dredmorbius wrote: | The conceit of _Ender 's Game_ was that _thoughtful_ | discourse would be influential online. | | Reality has largely demonstrated that far more | thoughtless propaganda of the Big Lie, Firehose of | Bullshit (or Falsehood), associated with Russia, floods | of irrelevance which tend to bury more significant | stories, favoured by China, and outrage / hot-button | topics, which are common in US-centric media, though a | timeless technique. | | Memes and simple messages attract attention and spread. | Complex narratives and analyses ... not so much. | | But yes, voices that deserve no attention whatsover have | dominated the media landscape of the past decade or so. | Not that this is _entirely_ novel. | carlosjobim wrote: | Isn't this how it has been since the dawn of time? | RandomWorker wrote: | My sense is to avoid this have a personal blog. | | That being said how many people write blogs with grammerly or | chatgpt these days. The temptation to use these technologies | all the time is too strong for even self preservation of your | own (writers) voice. | | My sense is that you use this technology you might be happy | with the results at first but on later review you just notice | something off in some sentences and maybe it just doesn't flow | right. I'm not convinced that it will replace writers jobs yet. | Especially when you want to create something authentic and | unique. | pseudosavant wrote: | Sometimes the value is specifically because my voice won't | come through. When I'm stressed and being asked for | unreasonable things at work, I know that I tend toward | passive aggression. But professionally, that isn't the way I | want my message to come across. | | I use ChatGPT all the time to suggest how I could make sure | something isn't passive aggressive. It'll point out parts | that aggression and suggested changes. It can be for a short | slack message, or a many paragraph message. | floren wrote: | I have definitely read "blogs" written by stitching together | LLM outputs. For years people were advised that a technical | blog "looks good on a resume" so we saw lots of lightly | rewritten Stackoverflow content. Now it's gotten easier. | tredre3 wrote: | > The temptation to use these technologies all the time is | too strong for even self preservation of your own (writers) | voice. | | I don't know about that. I have played with | ChatGPT/Copilot/etc enough to know what they're capable of | doing. But the thing is, I enjoy programming. I enjoy | breaking down a problem and solving it with code. I enjoy | crafting elegant code. So I don't use AI even though I'm | fully aware it could save me hours on projects. Why? Because | I enjoy those hours very much. | | Why am I telling you all this? Because I suspect many writers | are the same and personal blogs are their canvas. They enjoy | communicating. They enjoy crafting articles. They might have | AI proof-read them, but they won't let them write everything. | So, to me, there is hope that personal blogs will maintain | their human element, as opposed to news websites or tabloids | or learning platforms. | steelframe wrote: | > So I don't use AI even though I'm fully aware it could | save me hours on projects. | | Enjoy this luxury while it lasts. Based on what I have seen | in performance review committees for software developers, | your peers who drive results faster than you do because | they use AI will be rewarded more and will be more likely | to survive rounds of layoffs when they inevitably happen. | JohnFen wrote: | That's fine. I genuinely wouldn't want to continue | working in an industry that worked like that anyway, so | I'd just quit and keep on programming with my own | projects. So that luxury will last as long as I want it | to. | SoftTalker wrote: | Agree. I've never even looked at any of these AI tools. I | enjoy the process and the challenge of programming, and the | rewards of doing it well. I have no desire for someone or | something else to write code for me. | robinsonb5 wrote: | I suspect in the coming years the Wayback Machine at | archive.org will become ever more important - always assuming | it's not lost as collateral damage in their copyright battles. | Indexing that dataset and making it searchable would massively | increase its value. | | My inner conspiracy theorist can't help wonder if the continued | reduction in search usefulness isn't part of an ongoing | deliberate disempowerment of everyday people - but my rational | side says it's merely an unfortunate emergent behaviour of the | systems we've built. | carlosjobim wrote: | The shadow libraries. | datadrivenangel wrote: | There's the branch of philosophy called epistemology. | LetsGetTechnicl wrote: | Just another reason that I consider generative AI to be a lot | like crypto. A lot of talk about it being the future but really | only turns out to be dangerous or useless. I find it incredibly | irresponsible that companies are shoving their latest AI tech | into all their products when it's still unproven. | stevenwoo wrote: | One thing I've noticed about simple one word searches on Bing | now - a lot of times it just errors out and closes the Bing app | tab you've opened with no explanation to the user. This only | started happening after they pushed the AI driven search | narrative to make you use it in the app, so apparently single | word searches are too much somehow for their version of AI to | handle. | happytiger wrote: | AI has so completely disrupting Search that it's destroyed | leading platforms effectiveness in a matter of months. | | But because of its current lack of optimization for accuracy, | we shouldn't consider it disruptive because it's not yet proven | technology? | | You can call it dangerous but you can't call it useless. It's | also only going towards improvement from here, including | drastic reductions in hallucinations. | | You have to remember too that AI models are generally | attempting to interpret the intent behind the prompt, so many | of these crazy articles are happening because people aren't yet | good at writing clear instructions for AI and AI isn't yet | mature enough to disambiguate poor instructions in its output | and is trying to deliver on unclear instructional intents. | [deleted] | 12_throw_away wrote: | > It's also only going towards improvement from here | | Why? | 0xEFF wrote: | See for yourself, 4.0 is clearly improved over 3.5. | ChatGTP wrote: | True, 5 is a bigger number than 4 so logically it makes | sense. | pseudosavant wrote: | Except, unlike crypto, ChatGPT helps me with real day things | that I easily find at least $20/month of value from. | figassis wrote: | I think we all saw this coming, talked about it, articles were | published even...but now its news | gumballindie wrote: | The correct term is spamming. People are using these text | generators to spam everyone and everything under the sun. It will | be detrimental to the internet as many people will just give on | this huge pile of ... spam. | kiernanmcgowan wrote: | Without naming the company, I have seen specific examples of blog | posts being written by AI, hallucinating a "fact", and then that | "fact" re-surfacing inside of Bard. | | Its xkcd's Citogenesis automated and at internet scale | https://xkcd.com/978/ | mattlondon wrote: | Or to use the technical term: "shat the bed". Welcome to the | future. | Condition1952 wrote: | Please get your answers from Anna's Library | abruzzi wrote: | I have to say--the opening paragraph doesn't describe a reality | I'm familiar with: | | >Web search is such a routine part of daily life that it's easy | to forget how marvelous it is. Type into a little text box and a | complex array of technologies--vast data centers, ravenous web | crawlers, and stacks of algorithms that poke and parse a query-- | spring into action to serve you a simple set of relevant results. | | Web search has, for me, become a nasty twisted hall of mirrors | well before generative AI. I almose never get fed relevant | results, I alsmost always have to go back and quote all my search | terms because the search engine decided it didn't really need to | use all of them (usually just one.) The only difference is the | poison was human generated. generative AI will simply erase the | 5% of results that might give me an answer quickly. | meowface wrote: | I've had the exact same experience. That said, when I do add | all the right quotes and conditions to the query to filter out | the blog/newsspam drivel, I still - usually - eventually - get | pretty good results. Sometimes I have to switch to Bing or even | Yandex, but it's rare. | | Adding "reddit" to queries can be pretty useful. You're prone | to get terrible, inaccurate information since it's just random | people on an internet forum, but at least it's (usually) actual | humans and not blogs trying to SEO-game. (Though one big caveat | is searching for products/services. Lots of threads full of bot | accounts writing "[link] has been the best [thing], in my | experience". They're usually easy to spot, but sometimes they | do seem pretty natural until you check the post history.) | ryandrake wrote: | > You're prone to get terrible, inaccurate information since | it's just random people on an internet forum, but at least | it's (usually) actual humans and not blogs trying to SEO- | game. | | Less and less so. Reddit has always had a bot problem, but it | seems to be getting exponentially worse lately. Not just | article reposters, but comment reposters, bots that reverse | images and videos just to repost, seems like it's at least | 75% bot content now. | bnralt wrote: | Not only that, but you're also left with the issue of parsing | what someone else has written. Even when using answers I find | from web searches, I often drop results into ChatGPT so I can | get a rough idea of what the person is trying to say first, or | check if it agrees with my understanding of what's being said. | jfengel wrote: | I experience that when I try to google for technical problems | I'm having at work, but otherwise searches still go pretty well | for me. | | I just had to google a bunch of races that I wanted to run. The | top result was always the event's own web site. | | When I google some news, relevant news articles always come up. | | The last search I did was for how to display a ket vector in | LaTeX. The top result was the StackExchange article with the | right answer. | | From what I see, certain domains seem to be targeted for | exploitation. Programming questions seem to be high up on the | list. I wonder if that skews HN readers' perceptions. | JSavageOne wrote: | Google search to retrieve specific factual information is | pretty good. | | Google search to retrieve anything opinion related has been | horrible and infested with blogspam for years (hence people | searching Reddit to get that kind of info). | jamal-kumar wrote: | Really? I've been finding it doesn't even find stuff it | used to in certain documentation (I'm talking like things | it found maybe a year ago), "searching in quotes for this | stuff", things that other search engines (bing, kagi) are | indexing just fine - And since I've switched to using these | engines more when I'm searching things for programming | work, it's definitely been a lot more helpful than google | which often just seems to be missing a ton now | jfengel wrote: | I suppose it never occurs to me to search for opinions. I'm | not even sure how I'd got about it, even if search weren't | broken. Blogspam is what I'd expect to see. | | I'm more likely to start at a place that aggregates reviews | and try to hallucinate which ones were written by people | who know what they're talking about. That usually seems to | work. | | I imagine that somewhere out there is a person who bought | the product and reviewed it on their blog or made some | enthusiastic social media post about it, and that's what | you'd want to locate were it not for the spam. But I don't | expect any search engine to be able to find it for me. | fnordpiglet wrote: | Google search to retrieve product marketing pages is pretty | good. Specific factual information searches lead to product | marketing pages. Opinion searches lead to product marketing | pages. | | Google is a giant adware tool that's been taken over by | adware SEO sites. The example given - find the product | marketing pages for some races - falls directly in its | sweet spot. If you venture outside it'll do its best to get | you back into the product marketing sweet spot, and the SEO | companies of the world take care of the rest. | | Search is a lost cause. | icyberbullyu wrote: | As someone who has been using search engines since the 90's, | I've found that the "old-school" way of formatting your search | almost like a database query has gotten significantly worse. It | seems like search engines are geared more towards natural | language queries now; probably because the old Google-Fu way of | doing things wasn't very friendly for people who didn't use | computers regularly. | klyrs wrote: | My understanding is that google went from a more traditional | database style which supported such queries, to a newer | "n-gram" index with a layer of semantic similarity. Notably, | you can no longer put a sentence in quotes to only find pages | that contain that exact phrase. Also, the order of words | matters more now than it used to (where the old search | engines treated a space as AND, so order was irrelevant | outside of quotes) | saalweachter wrote: | https://www.google.com/search?q=%22you+can+no+longer+put+a+ | s... | klyrs wrote: | Hah, perhaps I should edit that to say "reliably." | interstice wrote: | If someone brought back a search engine like this i'd happily | use it | heavyset_go wrote: | Sounds like a success if that means people see more ads while | trying to find what they actually searched for. | __loam wrote: | Nationalize Google. | | Nothing will change as long as search is optimized for | revenue over user value. | loupol wrote: | Agreed that web search quality has been deteriorating since | much earlier than LLMs gaining popularity. | | Interestingly, we are in spot right now where I feel that for | certain types of queries LLMs can outperform search engines. | But from what is shown in the article, it seems like that state | might only be temporary, and that in the same way that shitty | content farms mastered SEO and polluted search results, we | might see the same happening with LLMs that have access to the | Internet. ___________________________________________________________________ (page generated 2023-10-05 23:00 UTC)