[HN Gopher] AI is making it easier to create more noise, when al...
       ___________________________________________________________________
        
       AI is making it easier to create more noise, when all I want is
       good search
        
       Author : saeedesmaili
       Score  : 305 points
       Date   : 2023-03-08 15:03 UTC (7 hours ago)
        
 (HTM) web link (rachsmith.com)
 (TXT) w3m dump (rachsmith.com)
        
       | citizenpaul wrote:
       | The only solution is the death of exponential growth and easy
       | investment money.
       | 
       | Probably ultimately the technical solution will be some sort of
       | variation on PGP key signing parties. No way to get 10k users per
       | financial period with that real world friction though.
        
       | hexage1814 wrote:
       | I think a big problem is how closed and walled the platforms
       | become. For instance, search engines don't have access to the
       | whole twitter database, Google Images don't index instagram posts
       | and so and so forth.
        
       | barbariangrunge wrote:
       | The Seo spam has already made it hard to find what you want on
       | google. Seo spam + ai is going to be a dumpster fire. Whoever
       | solves this problem will probably get to be the new google
       | 
       | Can't wait to see how this affects hn comments over the next 5
       | years
        
         | mjevans wrote:
         | Back of napkin approach:
         | 
         | Allow-list some sites that already have a good reputation for
         | useful results (probably anything scraped for summary snips).
         | 
         | Penalize all other sites based on the quantity of ads and
         | similar non-content.
         | 
         | Use the old web search core on what's left.
        
           | ssklash wrote:
           | Kagi lets you do exactly this. You can allow/block list
           | certain site results, or create a curated list of domains to
           | include results from.
        
         | guluarte wrote:
         | sadly closed gardens like reddit, human moderation is the way
         | to go
        
         | avereveard wrote:
         | > Can't wait to see how this affects hn comments over the next
         | 5 years
         | 
         | I think new form of comment ranking will take over, all those
         | system we're building to detect ai spam will also double as
         | scoring system to evaluate content originality. the problem
         | will be to differentiate good original content and bad original
         | content, but that can be left to flagging systems.
        
         | coldtea wrote:
         | > _Whoever solves this problem will probably get to be the new
         | google_
         | 
         | If it's solvable. There are many problems that aren't.
         | 
         | Perhaps the new Google would just be a 1996-era Yahoo! human-
         | curated catalog.
        
           | philote wrote:
           | I've often wanted something like that yahoo catalog. I miss
           | the days of being able to browse categories of sites.
        
             | tremon wrote:
             | My hope is for something like YaCy with a web-of-trust
             | overlay. I'd want it to only show results from peers that I
             | trust, and if garbage starts showing up in my search
             | results, I want to know from which peer it came so I can
             | un-trust it.
             | 
             | Which in my mind is somewhat of a middle ground between a
             | search engine and a directory: what I'd want is a whitelist
             | of curated directories that contain domains to be crawled
             | by my search engine.
        
         | izzydata wrote:
         | My solution is to go back to meat space and physical libraries
         | of information.
        
           | JohnFen wrote:
           | It's seriously hard to beat libraries. For a while, it looked
           | like the internet was making them less instrumental, but I
           | think the balance started tipping back toward them a few
           | years back.
        
         | lunfard000 wrote:
         | Websites are seo-ing Reddit now (shit things like: Top ten
         | airfryer recommended by Reddit), so appending reddit is not
         | enough anymore. In a very close future we are going to have to
         | type the whole site:reddit.com.
        
           | whatevaa wrote:
           | Well, I already do that today. Probably won't help too much,
           | though, cause AI will generate some legit looking
           | posts/comments and you will have a hard time figuring out if
           | they are written by a bot or person.
        
             | Pxtl wrote:
             | Exactly. I keep coming back to the fact that, between all
             | the different ways AI can generate content, the entire
             | user-contributed-internet is about to be completely flooded
             | with amicable and true-sounding nonsense with a peppering
             | of whatever viewpoint the bot-runner wants to push.
             | 
             | I honestly don't see any way out besides Digital ID -- if a
             | person has to provide the host with information about who
             | they really are, then once they're discovered the host can
             | actually prevent further abuse. Otherwise, a single person
             | can create infinite bots to shill whatever product or
             | viewpoint they want.
             | 
             | Which is why I have conspiracy theories about the
             | conspiracy theories about digital ID. The same actors who
             | use the internet to push misinformation benefit from
             | anonymity.
        
               | pictureofabear wrote:
               | It's almost like the Matrix but in cyberspace where
               | humans are constantly trying to escape the bots, except
               | instead of mining the humans for energy, the bots are
               | mining data.
        
         | nico wrote:
         | Everyday I'm using Google search less and less, while using
         | ChatGPT more and more.
         | 
         | It's pretty rare that I want to find a website or document.
         | Most of the times I want an answer or a solution, and ChatGPT
         | is so much better at that than google.
         | 
         | The ChatGPT user-experience is mind-blowing, but when you start
         | using their API and see, partly, how the sausage is made, you
         | realize that the context-awareness is just a very well done
         | illusion. But that's exactly where the magic of the experience
         | is, and what gives ChatGPT it's edge.
         | 
         | Google should just launch a "chatux" version of their search.
        
           | bagacrap wrote:
           | Why were you using a web search if you didn't want to access
           | the web (find a website)?
           | 
           | Openai says don't use chatgpt for search. Microsoft says
           | double check everything Sydney tells you. Is there an
           | argument for LLMs replacing search besides laziness?
        
             | imachine1980_ wrote:
             | If this works as we wish(no saying that is posible), this
             | will reduce the amount of seo spam trash you read to 0%,
             | will allow to investigate things faster, and will allow to
             | have more recursive way to expore topics. And this will be
             | key to make ok google more similar at what we wish as ai
             | companion
        
           | napier wrote:
           | Have you tried Bingchat yet? It's rumoured to be using a more
           | advanced model from OAI than ChatGPT or Davinci and is very
           | often surprisingly excellent.
        
             | beauzero wrote:
             | I was pretty decent early on. Now I just use it to augment
             | copilot...generally when I don't have a specific "start"
             | but just an idea...works better than dumping a comment into
             | copilot. I was hoping with the 3 tabs creative -> more
             | exact it would be a little better but I haven't seen any
             | difference. It isn't quite as interesting as when it was
             | regularly hallucinating. I found the hallucinations to
             | bubble up inspiration...forgotten facts, unused paths, etc.
        
             | nico wrote:
             | Keep reading great things about it. Will have to try it.
             | Thank you for the recommendation!
        
             | wetpaws wrote:
             | It is so low heavily lobotomized that 99% I find much
             | better results with chatgpt even though it is technically
             | an older model.
        
               | nirav72 wrote:
               | I've noticed the same thing. Other than generating code
               | examples - I found it be useless for search.
               | 
               | I've had better luck using a third-party extension that
               | inserts web search context into the ChatGPT than using
               | Bing chat to search for something.
        
               | nico wrote:
               | Could you mention or link the extension here? Thank you!
        
               | nirav72 wrote:
               | Sure. The one I'm using is WebChatGPT:
               | 
               | https://chrome.google.com/webstore/detail/webchatgpt-
               | chatgpt...
               | 
               | Firefox: https://addons.mozilla.org/en-
               | US/firefox/addon/web-chatgpt/?...
               | 
               | note - I haven't tried the firefox version. But the
               | chrome one works fine in Brave.
               | 
               | Edit: Source for the above extension:
               | https://github.com/qunash/chatgpt-advanced
        
           | sangnoir wrote:
           | > Everyday I'm using Google search less and less, while using
           | ChatGPT more and more.
           | 
           | Wasn't ChatGPT3 trained on web content with a cutoff of 2021?
           | Future ChatGPT versions will likely be trained on data
           | tainted by AI-generated fluff, and will face the same
           | challenges Google is facing with today's web content.
        
             | whstl wrote:
             | This is already a problem with "content marketing"
             | companies.
             | 
             | Most of those SEO-spam blogposts are based on older
             | internet content, and written by non-expert copywriters
             | that can write about 10 different subjects on a given day.
             | A lot of the work is just information compilation and
             | rephrasing, taking some items from a few Buzzfeed lists and
             | changing the text a bit. Some stuff like SEO-spam medical
             | advice can be downright dangerous if done without care. And
             | a lot of companies in this field have been experimenting
             | with AI for at least 10 years.
             | 
             | The problem however is that they require sources, be it
             | freelancers of AI. And there is so much corporate blogspam
             | today, especially in the areas that use content marketing,
             | that it is becoming harder and harder to find "first
             | generation" content written by an actual expert. Google
             | isn't helping by prioritizing newer content.
        
             | coldtea wrote:
             | And at an even worse scale, because soon AI-created SPAM
             | will hugely dominate actual content...
        
             | jmbwell wrote:
             | And when this serpent starts eating its tail and the AI
             | starts returning highly-ranked results consisting of AI-
             | generated content, ... well, I hope I can move to a farm by
             | then.
        
               | SketchySeaBeast wrote:
               | Hmm, maybe the singularity won't be a time when the AI
               | take over, but instead when a single element's
               | hallucination causes a cascade failure that brings it all
               | down.
        
               | danielvaughn wrote:
               | This is exactly what I think will happen. The internet is
               | quickly going to fill up with AI garbage, and it's going
               | to be more and more difficult for AI to source new human
               | content to feed into it's own model.
        
               | nirav72 wrote:
               | It kind of already is. Self-published ebooks on amazon's
               | kindle store are currently flooded with garbage written
               | using ChatGPT.
        
               | nico wrote:
               | It's been like that since humans have been around. That's
               | the way knowledge spreads.
               | 
               | We humans have been repeating and imitating ourselves
               | forever. That's even the way babies learn to talk.
               | 
               | And we are not all clones, we don't all think, say or do
               | exactly the same things. But most just repeat and consume
               | the same content and ideas (movies, music, books,
               | languages, media).
        
               | tremon wrote:
               | _That's even the way babies learn to talk_
               | 
               | No, it's not. Babies learn to talk by babbling, and then
               | by taking cues from their parents which babbles elicit a
               | response and which don't. If you want to translate that
               | to AI, it would mean that the AI would spew random
               | garbage and then learn to filter its garbage from the
               | responses to its outputs. That's pretty far away from
               | what ChatGPT and other language models are doing right
               | now, because they stop learning before they start
               | producing any output.
        
               | mfcl wrote:
               | (Why are you being downvoted?)
               | 
               | I think the important part is to have a minimum of
               | filtering. Humans consume knowledge brought by others
               | humans, but most of the time we cherry pick the true and
               | useful knowledge and reject what turns out to be false
               | (ideally).
               | 
               | I think what AI brings to the table is automation and
               | speed, but the quality is not better. So if AI starts
               | consuming its own content, will that decrease the quality
               | of knowledge* in general?
               | 
               | (*Here I mean the first knowledge you quickly get from a
               | search or asking some AI, not what you could get after
               | hours/days of research.)
        
               | nuancebydefault wrote:
               | HN crowd is so very skeptical about the usefulness of
               | a.i., especially related to content generation. I share
               | the skepticism (the positive and hence out of control
               | feedback loop of ai consuming its own content
               | recursively). But on the other hand, when smart people
               | enhance the primitives of the a.i. algorithms, a.i. might
               | become better at rating the quality of its own output.
               | Not by means of the question, 'does this look like
               | anything found on the internet' , rather by the question,
               | 'does this make sense, is this even possible, do the
               | numbers add up' ? Also, it is not very hard to "freeze" a
               | pre-ai knowledge base or NN, and use that as a reference
               | for sensibility.
        
               | coldtea wrote:
               | > _But on the other hand, when smart people enhance the
               | primitives of the a.i. algorithms, a.i. might become
               | better at rating the quality of its own output_
               | 
               | All incentives of massive industries like SPAM, "content
               | creation", "news" publishing, and advertising, are
               | against it becoming better at rating the quality of its
               | output - or rather, just have it become better at being
               | undetactable but still a cheap fast mass produced wall of
               | text...
        
               | coldtea wrote:
               | >It's been like that since humans have been around.
               | That's the way knowledge spreads. We humans have been
               | repeating and imitating ourselves forever.
               | 
               | Humans repeating and imitating others has historically
               | been a "volatile memory" keep-alive mechanism. Basically
               | saving culture and knowledge in people's minds and
               | helping it transfer, as literacy was low and (manual)
               | writing and book copying scarce, expensive, and time-
               | consuming.
               | 
               | Even when, with the advent of typography writing was
               | easier to reproduce, but still somewhat costly (cost of
               | materials, typesetting, distribution, etc.) and
               | gatekeepers (publishing houses, bookstores, etc.) ensured
               | that most stuff is somewhat original, not just copies or
               | random permutations of the same content. Indexing was
               | also costly manual labor (creating dictionaries, curated
               | bibliographies, books with oversight on a subject matter
               | and references to what the main science/wisdom/etc about
               | that is, library collections, etc.)
               | 
               | Now, however, SPAM and AI-SPAM is on mostly permanent
               | storage - and permanent storage, indexing, and
               | duplication/reproduction costs close to zero and happens
               | automatically at huge scale.
               | 
               | So, no, it's not the same thing. The same way breaking a
               | quick little wind is not the same as having full-blown
               | Taco-Bell-inspired diarhea.
        
               | [deleted]
        
             | sam0x17 wrote:
             | Yeah someone needs to figure out how to get a continuously
             | trained online model up and running without running the
             | risk of people figuring out how to exploit it
        
           | krupan wrote:
           | Is this actually giving you good results? The few times I
           | tried to use chatGPT instead of google I got terrible
           | results:
           | 
           | - Horrible and wrong driving directions
           | 
           | - Buggy incomplete code
           | 
           | - long winded explanations of how it was just a chat AI and
           | not a whatever whatever blah blah blah
           | 
           | I found it to be tedious and incredibly untrustworthy
        
           | Groxx wrote:
           | It'll probably look something like this.
        
           | userbinator wrote:
           | Does ChatGPT know things like part numbers for mid-80s ICs by
           | Japanese manufacturers?
           | 
           | That's what I use search for. Unfortunately Google is getting
           | worse at those queries, but I doubt AI is going to be better.
           | 
           | Beware of wanting "answers" or "solutions" --- that's a
           | slippery slope towards complacency and loss of agency,
           | replaced by corporate subservience. Classic example: instead
           | of finding a service manual or discussions on repairing
           | something, AI may try to convince you to buy a new one.
        
             | gmadsen wrote:
             | that seems like exactly something bing could give you
        
             | ToValueFunfetti wrote:
             | ChatGPT does surprisingly well at this task. My question
             | was very biased towards it knowing the answer because I
             | asked it for examples of mid-80s Japanese ICs when google
             | failed me, but it definitely knows a ton of model numbers
             | for real parts from the era. Give it a shot with a real-
             | world example and report back
        
             | squarefoot wrote:
             | If you are looking for data sheets, the Internet Archive
             | contains some. Example: https://archive.org/search?query=su
             | bject%3Atoshiba+integrate...
             | 
             | Also, a Google search followed by "site:archive.org" will
             | filter out everything not coming from archive.org, and
             | "filetype:pdf" will return only pdf files.
             | 
             | Sometimes it can be more effective a search for images, so
             | that one can recognize the target book by the cover. That
             | can be especially effective with old titles that were
             | scanned but not OCR'ed where the file name is all one can
             | search for, so that ambiguous part names (for example ICs
             | named like airplane flights) can be easily recognized.
        
           | jsenn wrote:
           | > that the context-awareness is just a very well done
           | illusion
           | 
           | Could you expand on this? The context awareness is the part
           | that blows my mind. I would be disappointed/fascinated to
           | find out that it's "just" a simple set of tricks!
        
             | nico wrote:
             | ChatGPT is "only" a chat interface to GPT.
             | 
             | GPT basically has access to two contexts: its internals,
             | and the prompt that it gets
             | 
             | Then when you ask something to ChatGPT, it takes your
             | prompt and it generates a new prompt that includes the
             | previous messages in your session, so that GPT can use them
             | as context.
             | 
             | But, there's a limit to the size of the prompt. And it's
             | not that big.
             | 
             | So then ChatGPT's magic is figuring out how to crafts
             | prompts, within the size limits, to feed GPT, so that it
             | has enough context to give a good answer.
             | 
             | Essentially, ChatGPT is some really amazing prompt-
             | engineering system with a great interface.
        
               | jsenn wrote:
               | That makes sense. So ChatGPT has to use some cleverness
               | to present the relevant previous text in the current
               | prompt, but the language model is still responsible for
               | pulling out the meaning of that context and responding
               | appropriately. I remain mind-blown :)
        
               | nico wrote:
               | Exactly. Very eloquently put.
        
           | nirav72 wrote:
           | >while using ChatGPT more and more.
           | 
           | I'm assuming you're using ChatGPT to validate/generate
           | technical solutions such as code and not necessarily
           | searching for specific information? If its for information
           | search, then how do you deal with the fact that at times it
           | tends to make up a things that are factually incorrect or
           | logically inconsistent?
        
             | ruszki wrote:
             | By using Google to verify. It's still okayish to find out
             | whether a statement is true or not. The problem is that
             | Google is becoming worse to find that statement. For
             | example, I wanted to figure out how I should replace a
             | given dependency. Google was terrible, so was
             | Stackoverflow. ChatGPT gave me a guess. For such questions
             | it's quite right in about 3/4th of the queries, and when I
             | search for the given guess Google finds it immediately
             | whether it's fine or not.
        
           | passion__desire wrote:
           | [0] : "You can try doing a web search with the URL, but
           | Frankly, I haven't had much success with that."
           | 
           | In my experience, Bing is the best search engine for finding
           | info on deleted videos - for example:
           | bing.com/search?q=youtu.be/t1wjL4BqXlI 1st result shows title
           | of video: "Awolnation "Sail" - Unlimited Gravity Remix"
           | 
           | [1] Daniel Dennett talking about why we should still support
           | explicit string search.
           | 
           | [0] https://support.google.com/youtube/thread/3876476/how-to-
           | fin...
           | 
           | [1] https://youtu.be/arEvPIhOLyQ?t=1330
        
         | stockhorn wrote:
         | How about letting the users flag each website with a label "seo
         | spam", "useful", " fakenews", "inaccurate"... and display a
         | summary of the votes as a badge beneath each link in the search
         | results?
         | 
         | We could create a browser extension to add this functionality
         | to all search engines at once. Using a web of trust mechanism
         | to ensure that we only get real votes...
         | 
         | https://news.ycombinator.com/item?id=34999285
        
           | Veen wrote:
           | Wouldn't work. You'd have "SEO professionals" selling
           | automated upvote, downvote, and labelling botnet services
           | almost immediately. Nothing that can be gamed is reliable at
           | scale.
        
           | pushfoo wrote:
           | Spammers would submit fake tags faster than everyday users.
           | Since this is a social problem rather than a technical one,
           | maybe old-school trust networks are worth considering
           | instead.
        
             | bnjms wrote:
             | I assume implicitly that votes only count for users two
             | degrees of separation from you and it's tied to real life
             | friends registry. Wish FB hasn't happened and the web of
             | trust stuff turned into something useful.
        
               | pushfoo wrote:
               | I don't think real life is the main issue. Instead, it
               | seems like the same issue PGP has: the benefits never
               | outweighed the bad UX for most people.
        
           | xgbi wrote:
           | Tie this to a proof of << browse >> and a blockchain to weed
           | out the big players and you got yourself a business model !
        
             | voz_ wrote:
             | Blockchain is garbage and totally irrelevant here.
        
               | originalcopying wrote:
               | no its not. blockchain is a legitimate innovation looking
               | for an real world use case.
               | 
               | blockchain technology's biggest problem is the fact that
               | its use cases touch the ideological foundations of our
               | global culture. And bitcoin operated on the foundational
               | level of any all governments in the world.
               | 
               | Bitcoin is just one use case of the technology, another
               | is land registries. Again, foundational institutions of
               | society; not the kind of thing that has ever been
               | peacefully reformed ever.
               | 
               | Your comment is evidence that 'cultural directors' of our
               | civilization decided to destroy this technology. However
               | it's funny to notice that they will proceed to implement
               | their own versions of it: possibly the renminbi unless
               | the dollars bomb them out of existence? but I'm so far
               | into guesswork that this whole comment ought to be voted
               | into the really light grays.
        
           | originalcopying wrote:
           | [flagged]
        
           | Dalewyn wrote:
           | Steam has user tags.
           | 
           | The last time I referred to them was to tag Elite: Dangerous
           | Odyssey as "Early Access".
        
         | wilsonnb3 wrote:
         | We're already drowning in an ocean of SEO bullshit without help
         | from AI. Adding another ocean isn't really going to change
         | anything IMO.
        
       | tomxor wrote:
       | > What I would use the shit out of, though, is a chatbot that has
       | been trained on all the information in the CodePen knowledge
       | base. Have it suck in all the meeting notes.
       | 
       | Yeah, it's pretty good at this, I've recently been hitting up
       | OpenAI's chatGPT with some queries that were too exhausting to
       | extract from google - and it's pretty good at surfacing resources
       | quickly - and the workflow of refining the query with the context
       | of the current thread of conversation works really well when you
       | are struggling to succinctly describe it in a single shot (which
       | google search doesn't really do beyond the global context of your
       | profile - which can actually be counter productive).
       | 
       | I really hate all the hype around chatGPT, it can't be trusted
       | for a lot of stuff, people over anthropomorphise it, but so long
       | as you don't rely on it for accuracy, it's pretty useful for
       | search.
       | 
       | One major issue I found is that ~95% of the time it can't provide
       | correct links to sources, this is fine if it can just name stuff
       | - then you can follow it up with a more specific google search.
       | But chatGPT will just make up bullshit links, in the same way it
       | will wax poetic some BS explanation to hit it's "looks correct"
       | training. You can even point this out and it will keep generating
       | variants on the URLs that are all fabricated.
       | 
       | It kind of makes sense that it would be good at search... it's a
       | language model, it should be able to link descriptions of
       | difficult to search for things to known resources.
        
       | mark_l_watson wrote:
       | I agree that having text generated inside a tool like Notion
       | might not be everyone's use case. The author mentions a chat bot
       | trained in internal documentation (fairly easy using GPT-3 APIs,
       | LangChain, and GPT-Index/Llama-Index - I am writing a book on the
       | topic https://leanpub.com/langchain).
       | 
       | I read that 100 companies released products using ChatGPT APIs
       | the first week the APIs were publicly released. I expect a lot of
       | useless and also a lot of very useful products. A little off
       | topic, but Salesforce Ventures just announced a $250 million fund
       | for generative AI startups
       | https://www.salesforce.com/news/stories/generative-ai-invest...
        
         | O__________O wrote:
         | Possible missing something, but custom GPT + index will not
         | solve the problem of the users knowing what to prompt to use to
         | access the information. Is there something I am missing?
        
         | gibsonf1 wrote:
         | Actually, not very easy as the LLMs don't give you sources and
         | literally make things up by design.
        
           | politician wrote:
           | Only about 10% of the links I ask for don't resolve.
        
           | sometdog wrote:
           | That is true, but you can combine LLMs with other tools to
           | get sourcing and more accurate answers overall. Instead of
           | using the LLM to directly answer a question, you can use the
           | question to search for relevant text in a particular
           | knowledgebase, and then use the LLM to summarize those
           | results.
        
             | DebtDeflation wrote:
             | >use the question to search for relevant text in a
             | particular knowledgebase, and then use the LLM to summarize
             | those results
             | 
             | As a consultant, based on personal experience, I can say
             | that what you wrote above constitutes >90% of current
             | enterprise use cases for ChatGPT. What clients REALLY want
             | to do is be able to take a pre-trained LLM and then train
             | it further on their own corpus of documents, but given
             | limitations around token window size, the above is probably
             | the best way to fake it for now.
        
             | gibsonf1 wrote:
             | If you use the LLM to summarize the results, it makes
             | things up by design. As soon as you introduce the LLM into
             | search, you lose.
        
               | raincole wrote:
               | I've personally replaced Google with Bing Chat for
               | technical things (like searching for specific API). Does
               | it make things up? Maybe. But in my experience it never
               | happes even once in past whole week (>100 times of
               | searchs).
               | 
               | It's not "it happened but you just didn't notice..." if
               | it uses a function call wrong I'd have noticed. My code
               | won't compile. My test won't pass.
               | 
               | So far it either gives me 100% correct result, or
               | completely fails. But it doesn't generate "seemingly
               | correct but actually wrong" things even once, unlike
               | ChatGPT.
        
               | autonomousErwin wrote:
               | I find this the killer application for ChatGPT (at least
               | for now). Answers you can _very_ quickly verify and care
               | little about the sources because a significant no. of
               | answers on Stack Overflow make ChatGPT look modest in
               | confidence comparison
        
               | throwuwu wrote:
               | Biased much?
        
           | jsheard wrote:
           | They do give sources if you ask for them... but have a habit
           | of inventing plausible sounding but completely fake sources.
           | 
           | https://news.ycombinator.com/item?id=33841672
           | 
           | That example is a few months old but OpenAI doesn't seem to
           | have made much progress here, if you ask ChatGPT for "a list
           | of academic papers about X" and it will nearly always
           | confidently churn out a list of 5-10 papers that don't exist.
           | Amusingly if you ask it for papers about an absurd premise it
           | will sometimes call out the absurdity and say there are
           | probably no papers on that subject, but then offer a more
           | plausible variation on that premise and trip up at the last
           | hurdle by inventing all the examples on the subject it
           | supposedly thinks is more likely to really exist.
        
             | aledalgrande wrote:
             | I got so many links to github repos which didn't exist lol
        
               | visarga wrote:
               | Somebody went as far as contacting the author of a paper
               | based on chatGPT suggestion. Of course the author was
               | pretty sure he never heard of that paper.
        
             | jerf wrote:
             | "They do give sources if you ask for them... but have a
             | habit of inventing plausible sounding but completely fake
             | sources."
             | 
             | Basically, the way you want to think about it is, no, they
             | can not give sources. That information is not in their
             | neural net. It can't be. There isn't anywhere to encode or
             | represent it.
             | 
             | What they can do is make a guess what a source might look
             | like, but even if they are right, it is only because they
             | happened to guess correctly, not because they knew the
             | source. They don't. They can't.
             | 
             | It isn't that "they give sources but they might be wrong",
             | it is "they will make up plausible-sounding sources if you
             | ask just as they'll make up plausible-sounding anything
             | else you ask for, and there's a small chance they'll be
             | right by luck". For more normal factual-type questions part
             | of the reason they are useful is that there's a good chance
             | they'll be right by what is still essentially luck, but for
             | sources in particular there's a particularly small chance,
             | by the nature of the thing.
        
               | sam0x17 wrote:
               | That said with enough training and training data, the
               | line between "plausible sounding" and "accurate" gets
               | thinner and thinner. This will be especially true as
               | these AI models refine their results based on user
               | interactions. Being right for the wrong reasons becomes
               | less and less relevant the higher the accuracy goes up,
               | and at a certain point, it might get so good that no one
               | cares.
               | 
               | Maybe human intelligence is more like that than we're
               | willing to admit ;)
        
               | jerf wrote:
               | Source identification isn't a space amenable to guessing,
               | no matter how much data you throw at it.
               | 
               | Here's an exercise you can try: Cite some information
               | from the next issue of Science to be published. Cite
               | anything you like from it.
               | 
               | You can make some plausible stuff up. You could make even
               | more plausible stuff up if you went and scanned over the
               | past few issues first. But without specific knowledge of
               | the contents of the next issue, you aren't going to be
               | able to create real citations. This is what LLMs lack, by
               | their nature. It's not a criticism, it's a description.
               | 
               | You can't guess sources. The possibility space is too
               | large, the distribution too pathological, and the
               | criteria for being correct too precise.
               | 
               | GPT will never cite sources correctly. Some future AI
               | that uses GPT as a component, but isn't entirely made out
               | of a language model, will be able to, by pulling it out
               | of the non-GPT component. Maybe it'll need to be built as
               | an explicit feature, maybe it won't, only time can tell.
               | But expecting language models to cite sources correctly
               | is not sensible. It's just not a thing they can do.
        
             | gibsonf1 wrote:
             | Right, they just make all things up by design, no matter
             | what you ask about. There is a statistical chance that a
             | pattern of words output may correspond to fact but just as
             | a good a chance not. The LLM's literally know nothing about
             | the world, its just statistical word pattern output.
        
               | mrguyorama wrote:
               | Lies and falsehoods are just as valid sentences as the
               | truth.
        
           | mark_l_watson wrote:
           | The web demo version of ChatGPT can sometimes, as you say,
           | make things up! I asked it a question about an ancient war in
           | Greece and it stated two facts and then just inventing stuff.
           | 
           | However, I would ask you to also consider something: when you
           | supply prompt input text (this is the "context text" in the
           | ChatGPT API calls) and then ask questions about the context
           | text, it is very accurate. It also does a very good job when
           | you give it context text from a few different sources, and it
           | integrates them nicely.
           | 
           | It is more efficient to use embeddings for context text, as
           | Llama-Index does.
        
             | nico wrote:
             | Do you have shareable links to good tutorials about using
             | embeddings with ChatGPT for this case?
        
           | [deleted]
        
         | neovive wrote:
         | The books sounds very interesting. I never worked with
         | LlamaIndex and now I'm intrigued by the possibilities!
        
       | DontchaKnowit wrote:
       | I think the biggest possible upside of AI is that there will be a
       | brief money grab where people are raking in money on
       | automatically generated internet content - then the internet will
       | be filled with useless noise and no one will use it in the
       | obsessive, drip-fed-dopamine-IV way we currently use it- cause it
       | will all just be garbage. Social media will continue to exist,
       | but consuming content will require trust in the publisher e.g.
       | tied to a real identity or to a trusted anonymous or business
       | entity.
       | 
       | Just my speculation, but I think the more garbage we put on the
       | internet the better
        
       | luxuryballs wrote:
       | why was this so hard to read? maybe use the AI next time :D
        
       | jamesgill wrote:
       | I was just thinking this yesterday. I realized that I always
       | thought the promise of AI was to cut through the noise for us and
       | bring us signal.
       | 
       | As I watch the ChatGPT-like products proliferate, I'm realizing
       | the opposite will be true. And soon, it'll be AIs to help us deal
       | with AIs. A layer cake of clever but useless 'tools' for
       | improving human life.
        
       | rchaud wrote:
       | > If you ask me, if there's one thing we don't need more of on
       | the internet, it is more soulless content written for "SEO"
       | purposes, with enough wordcount to inject ads between.
       | 
       | But this is precisely what people will pay money for! Companies
       | like Canva are 'unicorns' because people need faster ways to
       | churn out more templated digital detritus to grab your attention
       | with.
        
       | rootusrootus wrote:
       | AI, the cause of, and solution to all of our problems. At least
       | on the Internet.
        
       | seydor wrote:
       | AI is making the value of noise go to $0. There is no more SEO
       | signals that google can "optimize". It will be forced to switch
       | to AI results and somehow monetize that one, because its old
       | business model is over
        
         | all2 wrote:
         | AI essentially raises the publishing wall for "noise". Print
         | books used to be like this until the advent of single-unit/low-
         | volume publishing. We're essentially returning to a time when
         | the walled garden of content creators will be the most valuable
         | game. Substack and communities will likely grow as search
         | results and AI make "standard search" useless.
        
         | fdgsdfogijq wrote:
         | I cant wait to watch this play out
        
         | twelve40 wrote:
         | well AI is not magic and while it might be better at, say,
         | summarizing some documentation, the moment you hit a popular
         | commercial query or something not as clear-cut as programming
         | documentation, or something gameable, it will have to draw upon
         | the same shitty sources as everyone else: random sources of
         | data that have to be somehow ranked and filtered for noise. And
         | how does it solve that problem?
        
           | visarga wrote:
           | Don't be quick to write off AI for that task
           | 
           | > Discovering Latent Knowledge in Language Models Without
           | Supervision
           | 
           | https://arxiv.org/abs/2212.03827
           | 
           | They find a direction in activation space that satisfies
           | logical consistency properties, such as that a statement and
           | its negation have opposite truth values.
        
       | thanatropism wrote:
       | Obsidian vaults are just folders of .md files and huggingface
       | provides a great `sentence-transformers` package which allows you
       | to easily to k-neighbors search on BERT embeddings of your query
       | and vault. This is a weekend project really, and that considering
       | a streamlit or tk frontend as well.
        
         | bloudermilk wrote:
         | I'm just starting down this path myself. Any resources outside
         | of official HF docs you would recommend?
        
           | leobg wrote:
           | sbert.net is all you need.
        
       | Kiro wrote:
       | The title makes it sound like those are mutually exclusive but I
       | thought there was a bunch of services doing exactly what they are
       | describing, all built on ChatGPT.
       | 
       | E.g. https://ingestai.io/
        
       | runlevel1 wrote:
       | We've just invented the hammer.
       | 
       | It can do a lot of cool things! You can build a house with it,
       | you can smith metal with it, and you can even use it as a weapon.
       | 
       | The thing is, right now, we're so amazed by its potential that
       | we're finding a lot of uses that, while technically possible,
       | aren't a great fit.
       | 
       |  _Technically_ you can use the hammer as an axe, a hole digger,
       | and a backscratcher, but there are far better tools for the job.
        
         | jfengel wrote:
         | It's unclear to me that there is any use for the "hammer" of
         | text generation. It adds no new knowledge, and doesn't pretend
         | to. Its transformations of existing knowledge are neither
         | interesting nor attractive. Anything they say, has already been
         | better said elsewhere.
         | 
         | I can imagine uses for generated art, which may at least be
         | aesthetically pleasing. But I can't conceive of any end for
         | computer generated text.
        
           | flatline wrote:
           | Its use is very much in question - but it is certainly a
           | powerful tool, and that combination is worrisome. Much like
           | the social graph, it is going to have a profound impact on
           | how we interact online and with each other, and that impact
           | will not be known for some time, even though we may be
           | feeling it now, unawares. In a decade, maybe less, we will
           | have some picture of the use and power of these models, there
           | will be meetings in front of congress on how tech companies
           | have used them, etc.
           | 
           | Just look at what is happening in education right now. It is
           | ultimately going to force a complete reinvention of the
           | written assignment. This is just the beginning, even if the
           | tool appears to be a mostly-useless toy for any real-world
           | applications.
        
             | wilsonnb3 wrote:
             | > It is ultimately going to force a complete reinvention of
             | the written assignment.
             | 
             | If by complete reinvention you mean returning to what we
             | used to do, which is write essays with a pencil during
             | class without using a computer.
             | 
             | > This is just the beginning, even if the tool appears to
             | be a mostly-useless toy for any real-world applications.
             | 
             | It is not possible to tell on this side where LLMs (or any
             | invention) fall on the spectrum of 3D tv to the smartphone.
             | It will become apparent in the future and 50% of us will
             | have been wrong but anyone who claims to know is just
             | BSing.
        
             | jfengel wrote:
             | The fact that it has a notable downside is certainly
             | interesting. School assignments are well suited to LLMs
             | because they also don't present new information. That's not
             | what they're for; they're for assessing what the student
             | knows.
             | 
             | They're usually fairly obvious, but it's hard to prove.
             | Unlike much ordinary copypaste plagiarism, you can't
             | trivially reject it as cheating. That forces teachers to
             | think of new ways to test student knowledge... an
             | interesting challenge, if not exactly a "use".
        
           | jamesdepp wrote:
           | Personally, I think that the beauty of current LLMs are their
           | ability to process and present information. Although current
           | generation LLMs might not be best suited to making new
           | discoveries or producing new (valuable) information, their
           | ability to summarize and process information already out in
           | the open is undeniably valuable.
        
           | lunfard000 wrote:
           | I started to do some cpp development again and google is just
           | giving me wrong/outdated solutions or just unanswered
           | question from SO, while ChatGPT is on point most of the
           | times.
        
           | Hendrikto wrote:
           | Translation and summarization are just two examples of use
           | cases for which text generation is suited great.
        
           | l33t233372 wrote:
           | > Anything they say, has already been better said elsewhere.
           | 
           | I don't understand how someone can say this
           | 
           | There have probably never been poems written that explain the
           | particular niche physical phenomenon that I've had GPT-3
           | generate for me.
        
       | RC_ITR wrote:
       | The dirty little secret of Large Language Models is how many
       | humans are in the loop.[0][1]
       | 
       | Transformers are great at building extremely complex maps of
       | language without any human interventions, but if you want them to
       | consistently query the right part of the map (e.g., His Codepen
       | search example), you need a very non-trivial amount of human
       | feedback.
       | 
       | Will be interesting to see if all this hype leads to a solution
       | that scales better than what we do now (so that orgs. actually
       | _could_ have insanely good AI chatbots trained on their docs),
       | but the jury is still _definitely_ out on that.
       | 
       | [0]https://openai.com/research/learning-from-human-preferences
       | [1] https://openai.com/research/instruction-following
        
         | politician wrote:
         | Wow - [1] should be considered required reading. OpenAI is
         | baking in human employee biases during fine-tuning.
         | 
         | I would want to see the exact set of posed completions and
         | paired responses.
        
           | O__________O wrote:
           | My understanding is majority of "Reinforcement Learning from
           | Human Feedback (RLHF)" for OpenAI comes from contractors:
           | 
           | - https://time.com/6247678/openai-chatgpt-kenya-workers/
        
       | windex wrote:
       | Normal web search is doomed. The people creating SEO out of
       | ChatGPT dont realize that I am not going to be searching for
       | answers by reading page upon page of SEO optimized pages anymore.
       | I will ask an AI directly, and of late it has been a god send in
       | terms of code and systems administration. First time right in
       | most simple cases and queries.
        
         | bagacrap wrote:
         | How will the ai know the answer if it's trained on SEO (AIO?)
         | content?
        
         | JohnFen wrote:
         | > Normal web search is doomed.
         | 
         | I hope not. The likes of ChatGPT don't do what I want a search
         | engine to do. If normal web search is doomed, then how will I
         | find any good new stuff on the web?
        
         | orangepurple wrote:
         | The SEO out of ChatGPT is for boomers and other clueless
         | people. The dead internet theory is true.
        
         | otabdeveloper4 wrote:
         | Hope you enjoy endless product placement in your search
         | queries.
         | 
         |  _> Looks like you want to ask about how to administer your
         | Kubernetes cluster. While I am an AI and not qualified to give
         | advice on how to run mission-critical systems, I can heartily
         | recommend the amazing Azure Managed Kubernetes, which is
         | Gartner-certified for painless six-sigma reliability!_
        
           | windex wrote:
           | It isn't doing it right now.
        
             | jerf wrote:
             | The tech that detects that you're asking a racist question
             | can detect you're asking about Kubernetes or orchestration
             | in general and serve you up an ad as easily as it serves up
             | an explanation of why you shouldn't ask racist questions.
             | It is no different to the AI at all.
        
           | kimburgess wrote:
           | Why imagine when you can experience the future now:
           | https://future.attejuvonen.fi (from a recent thread here).
        
           | spacephysics wrote:
           | I think this will occur with free AI based search, since it
           | would be a similar business model to Google (if not more
           | bias/intrusive)
           | 
           | The alternative (which I think in todays age vs when google
           | came out, is more readily accepted by the masses), will most
           | likely be a subscription-based tool that caters to specific
           | niches and avoids product placement.
           | 
           | But to what I think your point is, the further removed one is
           | from the original docs/content/etc, the more likely/able some
           | middleman is to inject their own economic/political
           | incentives, which is of concern. Especially when AI has a
           | political bias, regardless of where it originates.
        
             | mrguyorama wrote:
             | >most likely be a subscription-based tool that caters to
             | specific niches and avoids product placement.
             | 
             | Have you not been paying attention? The expensive
             | subscription based plan will ALSO have ads and product
             | placement. These businesses just can't help themselves
        
               | twelve40 wrote:
               | who in their right mind would give up tens of billions of
               | ad revenue? the ads are coming no matter what.
        
               | ilikeatari wrote:
               | Yes that's possible and probable. However I dream that it
               | will ba a paid subscription businesses model or at least
               | non ad driven.
        
           | HDThoreaun wrote:
           | The big tech products might have this, but there will be
           | competitors that don't.
        
             | twelve40 wrote:
             | there will be non-rent-seeking competitors that will raise
             | $11B and will have nothing to do with big tech. Got it.
        
               | ilkke wrote:
               | Maybe nation-states? You'd just get a different kind of
               | ads then I guess.
        
               | HDThoreaun wrote:
               | You won't need $11B to create an LLM in a couple years.
               | Stable diffusion has created the blueprint for open
               | source large models. Sure it might be worse in some sense
               | than the cutting edge ad bloated products, but some
               | people will take that tradeoff.
        
               | MagicMoonlight wrote:
               | Once a model is trained thats 99% of the work. Open
               | source models or a hacker leaking a major model will be
               | enough to compete.
        
         | Frotag wrote:
         | I settled for reading the docs and searching github issues like
         | a caveman. I wonder if this'll eventually popularize low-key
         | advertisements in bug reports. Like "I scanned the repo using
         | this (my company's) tool and found mixed usage of ' and ""
        
           | tnzk wrote:
           | I have seen this once in the issues in sveltejs/svelte
           | (couldn't find the link though.) It had been closed by
           | moderators fairly soon, I wonder how long it will take for
           | the amount of this thing to surpass the capacity of voluntary
           | efforts.
        
           | rchaud wrote:
           | Keyword stuffing never dies.
        
         | dougb5 wrote:
         | I think the list-of-links Web search paradigm has plenty of
         | time left, for at least three reasons: (1) A good chunk of a
         | typical user's needs are to do something, not to find an answer
         | to a question; (2) Google is lightning fast compared to
         | present-day chat interfaces; (3) "keywordese" may be an
         | unnatural input language for search queries, but it's faster
         | than having a dialog.
         | 
         | I still make dozens of Google queries per day, and do maybe 1
         | or 2 ChatGPT sessions per week, and I'm quite aware of all the
         | capabilities and deficits of each. I wondered why this was,
         | until I reflected on the things I actually search for on Google
         | (Go to https://myactivity.google.com/myactivity and filter to
         | just show "Search"). This was a useful exercise. What
         | percentage of your recent queries would have worked well, and
         | more quickly, on ChatGPT? For me it was less than 1/10...
        
         | system2 wrote:
         | Average user doesn't know about AI. Average user is 99% of the
         | society. We are the only ones who are doomed.
        
           | fdgsdfogijq wrote:
           | ChatGPT is the fastest growing consumer product in history
        
           | bob1029 wrote:
           | Ive observed the average user go from zero to hero on ChatGPT
           | in about 30 minutes.
        
       | chrgy wrote:
       | Once AI makes 90% of new content, the only question is only the
       | qualities of the prompts. This AI we see is still very shallow
       | and it is only really good at creating the next word or pixel,
       | but at some point it would converge to AGI.
        
       | meindnoch wrote:
       | In a world of AI-generated trash, _provably_ genuine human-made
       | content will be more valuable than ever.
        
         | h10h10h10 wrote:
         | Why? I honestly don't care who or 'what' wrote something, as
         | long as it is useful. In fact, after a certain point there's no
         | way to actually distinguish human-written content from AI-
         | written content because it's the same.
         | 
         | We'll just become used to it and only a few people in HN will
         | shout to the clouds.
        
           | meindnoch wrote:
           | Currently the bandwidth of online astroturfing is limited by
           | the human copywriters producing it. In a post-ChatGPT world,
           | astroturfing will be limited by the number of GPU-seconds one
           | can pay for, which is going to be orders of magnitude
           | cheaper. It doesn't matter that AI is capable of producing
           | useful content, because _useful_ AI-generated content will be
           | a zero-measure subset of all AI-generated content in the
           | wild.
           | 
           | Compare with organic food: is it true that non-organic food
           | can be healthy? Sure! Then why do people prefer organic food?
           | Because it's just not worth it for them to figure out which
           | one is healthy/unhealthy.
           | 
           | Similarly, it's just not going to he worth the effort to
           | figure out if some AI-generated content is genuine or trying
           | to astroturf. If there will be a "certified human" badge,
           | people will use that as a positive signal.
        
             | h10h10h10 wrote:
             | You're right about that. Dealing with astroturfing will be
             | an arms race and will require some proof of "humanity" in
             | social media, forums, etc.
             | 
             | But, ultimately, I don't buy the idea that astroturfing is
             | all bad. Or the opposite, that the lack thereof is
             | necessarily good. I think bombarding with AI content can
             | have possitive effects, like overwhelming human moderators
             | who built echo chambers and ban wrongthink (e.g., Reddit).
             | Or, it has the capacity for a small organization to compete
             | with the likes of the NYTimes, or The Atlantic, etc which
             | currently control the narrative.
        
       | [deleted]
        
       | O__________O wrote:
       | Author mixes two topics, that being searching content they
       | control and content they don't control.
       | 
       | While understand desire to search content created by yourself, in
       | my opinion vast majority of valuable content is created by
       | others, in part because the most valuable content created by
       | yourself is rapidly internalized into your brain.
       | 
       | Meme of Google is getting worse fails to acknowledge that Google
       | is free and makes money from advertisers. More to the point, more
       | valuable the related search is per amount spent on advertising,
       | the more the noise and as result, more resources user will need
       | to spend to enhance the signal. This has nothing to do with
       | Google and everything to do with value of the information itself.
        
         | lesuorac wrote:
         | > More to the point, more valuable the related search is per
         | amount spent on advertising, the more the noise and as result,
         | more resources user will need to spend to enhance the signal.
         | 
         | This sounds like an argument of, The worse Google Search
         | becomes the more time a user has to spend their and therefore
         | see more ads.
         | 
         | But it has a very large hole in that the worse Google Search
         | becomes the easier it is for a user to switch to Bing/DDG/Apple
         | Search. It may seem unfathomable that people wouldn't use
         | Google Search but people felt the same way about Google Maps
         | which lost a huge moat to Apple Maps/Waze (albeit the latter
         | was purchased by Google).
        
           | dktp wrote:
           | I think the argument is that whatever search option you
           | choose to use will inherently have the same problems
           | 
           | Noise isn't just the paid ads, it's the websites gaming SEO
           | for valuable searches. And that is not isolated to Google
           | 
           | > more valuable the related search is per amount spent on
           | advertising, the more the noise and as result, more resources
           | user will need to spend to enhance the signal
           | 
           | Sums it quite well really
        
           | neura wrote:
           | IMHO, you're talking about 2 very different things. I would
           | guess that the maps difference was due to devices (not sure
           | if MacOS users even use Apple Maps with any frequency
           | compared to opening up maps.google.com in a browser, but I'm
           | guessing most iOS users are too lazy and complacent to use
           | google maps). Also, it's hard to recognize and understand the
           | differences in the various mapping apps.
           | 
           | OTOH, it's pretty easy to tell the difference in the first
           | few minutes (or first few searches) between various search
           | sites. Heck, the difference in the search engine having a
           | better idea of what you're looking for based on previous
           | searches (or ad related data) alone can make it more
           | compelling to continue using the same search engine.
           | 
           | My personal example is that I try to switch to DDG every so
           | often (maybe several months in between), but I get
           | dissatisfied with the results and start wondering if I'm just
           | getting bad at search or Google knows me better or Google is
           | just better at finding the things that people want in
           | general.
           | 
           | Just the fact (for me) that I consider that Google generally
           | gives me better results makes me wonder if all the talk of
           | Google search getting worse is just complaints based on
           | heightened expectations, feelings or the landscape of content
           | on the internet in general instead of "is Google search
           | getting worse?"
        
           | harimau777 wrote:
           | I'm not sure that most of those are any better. I think that
           | DDG is the only one of those that still respects modifiers
           | like +, -, and "". However, I've had mixed success with it.
           | It seems like maybe it's web crawler isn't as effective or
           | something?
           | 
           | That or maybe Google has poisoned the well so much that even
           | with a good search engine you can't find good results because
           | everything is SEOed for Google.
        
         | topaz0 wrote:
         | > rapidly internalized into your brain
         | 
         | I'm not sure how old you are/how long you have been working at
         | your job, but I can tell you that over 10 years or so there are
         | tons of things that I "internalized" for a while, then did
         | other stuff for a while, and now only have vague memories of.
         | This is the use case for searching your own content.
        
           | visarga wrote:
           | I'm wondering to this day why web browsers don't run a full
           | text search index locally. Only saving urls and titles is not
           | enough. Especially if I spend more than a minute on a page I
           | would like it indexed. And today being today, I would also
           | like a LLM on top.
        
         | CamperBob2 wrote:
         | Noting that Google being "free" isn't so "free" when low-
         | quality or made-up results make my job more difficult and
         | error-prone, or lead me to make suboptimal decisions in my
         | personal life.
        
       | krupan wrote:
       | Two words: Butlerian Jihad
        
       | gil2rok wrote:
       | This is precisely what Hebbia AI (https://www.hebbia.ai/) does --
       | they let you train a large language model on your own documents
       | then search them better.
        
       | classified wrote:
       | This will be all the rage, now you can produce ad spam and SEO
       | crap fully automated, and don't even have to hire copywriters any
       | more.
        
       | fwlr wrote:
       | Lucky break for the author - seems like a larger context window
       | is all you need to turn GPT's language skills towards search and
       | summarize, so that search they want will probably be shipping
       | soon.
        
       | lopatin wrote:
       | SEO has always been a war. Spammers automate spam. Google comes
       | up with ingenious ways to combat it. Spammers adapt.
       | 
       | I keep hearing that Google search is nothing but SEO optimized,
       | affiliate link riddled content nowadays. I don't disagree. I see
       | it too.
       | 
       | But what makes you think that the affiliate like riddled article
       | is worse then what you would otherwise find if you're searching
       | for "best office chair" on Google?
       | 
       | Google got so good at combating traditional SEO spam, that the
       | only way to "cheat" google is to actually write valuable content
       | and insert your affiliate links into it. This is what we see now
       | I think. The spam SEO sites actually provide more value than
       | random forum conversations.
        
         | cyanydeez wrote:
         | It's not just Google immune system that's making it's search
         | function useless, it's also that Google wants to advertise to
         | it's users regardless of what kind of search is being
         | performed.
        
       | deet wrote:
       | It seems like everyone is reading this article to be about public
       | web search.
       | 
       | But what the article is really about is how products are using
       | new AI tech to add shiny features instead of solving existing,
       | core problems like search and information retrieval in new and
       | innovative ways, especially for personal and private data.
       | 
       | The reason this is happening is pretty easy to explain: the
       | generative AI and chat demos are a sufficient leap beyond what
       | was previously possible that people are excited to be on the
       | frontier of new applications, not just new implementation of
       | previously known use cases.
       | 
       | Not to mention that some of the demos have people excited about
       | the "singularity" being closer than they might have previously
       | thought (though this can be debated...) and that VCs will shovel
       | money to you if you want to play around with generative AI even
       | without a proven use case (slight exaggeration but not much)
       | 
       | I personally believe that transformers and LLMs do unlock a ton
       | of new applications, especially when applied in interesting ways
       | to interesting data, like what is private-to-you or private-to-
       | your company. For example, LLMs can be used to not just generate
       | content, but plan sequences of actions including searches,
       | summarizations, and even calculations (see LangChain agents
       | https://langchain.readthedocs.io/en/latest/modules/agents.ht...
       | for an example of how to do this). And this can have real value
       | for existing, known problems like search.
       | 
       | People just have to choose to focus on these less-sexy but core
       | problems
       | 
       | ...
       | 
       | PS I'm currently working on a project towards this goal, and if
       | anyone is interested I'd love to talk (see link in profile). I
       | believe we can solve much of the author's desire by simply
       | hooking up the right tech to the right data sources, and doing it
       | in a privacy preserving way (for example we're running most of
       | our ML including vector DB, summarization, etc on device) and
       | then present that info at the right time (ie in your OS)
        
         | Zetice wrote:
         | I've said before and I'll say it again; the startups that find
         | ways to use AI to solve problems without having to advertise
         | that they're using AI will be the winners in the long run.
        
       | pydry wrote:
       | The more noise there is the easier it is to plug in advertising
       | :/
        
       | adamch wrote:
       | I really like the design of this blog. It's clean and simple but
       | has enough character from the carefully-chosen decorations. The
       | rainbow hyperlinks are a nice touch! Great styling.
        
       | mishu2 wrote:
       | I still use search but am concerned about the day when all
       | results will just be SEO AI-generated garbage (for some searches
       | that is already the case).
       | 
       | So I recently started https://cstdn.org/ and am sharing all good
       | links I can find there. I created a Show HN but didn't make the
       | front page, but anyone who wants to try it out is more than
       | welcome to do so.
        
       | MagicMoonlight wrote:
       | ChatGPT is really good if you don't need specific facts. It can't
       | tell you what 1+1 is. But ask it to write you an apology letter
       | and it's a god.
       | 
       | I think most websites are screwed. For facts I can go to
       | wikipedia etc but for answers now I can go to an AI. I don't need
       | to search reddit or anything because the AI is really good at
       | giving me human like answers to problems.
        
       | naillo wrote:
       | Well yes and no. Image 'search' has kinda been radically improved
       | via stable diffusion. It can help you find things that search
       | never would have enabled you to (e.g. because of 'anti biasing'
       | or because the search just isn't good enough).
        
       | okhuman wrote:
       | Why couldn't we take something like
       | https://github.com/mckaywrigley/paul-graham-gpt but more general
       | purpose for a doc site? Would that approach to a chat bot trained
       | in product documentation work?
        
       | aj7 wrote:
       | "What I would use the shit out of, though, is a chatbot that has
       | been trained on all the information in the CodePen knowledge
       | base."
       | 
       | Of course. Whether you're operating a 777, a refrigerator, or a
       | dildo, narrow but exhaustively trained ChatBots are the killer
       | app. This is worth in the $T range.
        
       | bloppe wrote:
       | The secret to good AI is massive amounts of quality data. To get
       | massive amounts of quality data, OpenAI and others employ massive
       | armies of data curators. It's possible that massive armies of
       | data curators could play a role in de-noising the internet of the
       | outputs of the very models they helped create, either directly or
       | indirectly by helping to train new models to detect AI noise.
       | Hard to predict how the adversarial AI noise race will play out,
       | but interesting to think that these epic noise machines were
       | themselves created by armies of humans working to remove noise
       | from the internet.
        
       | michaelbrave wrote:
       | I don't think search is bad because the tech can't do it, I think
       | it's bad because they are maximizing profit from advertising and
       | because SEO got gamed by marketer's too hard.
        
         | pphysch wrote:
         | With zero curation, SEO rules.
         | 
         | Ads are frankly a partial solution to the problem of SEO
         | corruption. Rather than giving the top spots to blackhat SEO
         | spammers, you give them to the highest bidder and therefore
         | damage the SEO industry. Lesser of two evils, but both suck.
         | 
         | Ads can also be seen as a degenerate form of curation, where
         | the curation function is just money + some loose content rules.
         | Is that better or worse than the curation being a function of
         | some particular set of values, i.e. do you want Democrats or
         | GOP partisans curating the top Google results?
        
           | JohnFen wrote:
           | In other words, the web is doomed? I wish I could say that
           | sounds implausible.
        
             | pphysch wrote:
             | The libertarian/anarchist ideal of the web was doomed from
             | day 1.
             | 
             | The only way forward IMHO is increased public (i.e.
             | government) control over it. The curated, regulated corners
             | of the internet can still thrive, with measured degree of
             | openness. Too much abuse elsewhere.
        
               | JohnFen wrote:
               | I'm talking about the web as a useful source of
               | information and entertainment, not in terms of some
               | libertarian/anarchist ideal.
               | 
               | If curation is the only way that the web can retain some
               | semblance of usefulness, that's a serious problem. It
               | would drastically limit the usefulness of the web.
               | 
               | Perhaps that is where this is all going. If so, I'd say
               | that's the web being doomed. I'm just hoping for a good
               | result instead.
        
               | pphysch wrote:
               | The internet has always been full of misinformation and
               | entertainment, and that is unlikely to change.
               | 
               | I think the last decade+ was in many ways a regression
               | for the web and I am optimistic for the future
        
               | JohnFen wrote:
               | > The internet has always been full of misinformation and
               | entertainment
               | 
               | Of course. That's not what I'm talking about. I'm talking
               | about the ability to find stuff.
               | 
               | > I am optimistic for the future
               | 
               | I'm honestly glad! I sorely wish I were.
        
       | inductive_magic wrote:
       | I find it increasingly tiring that "ai" is used as a synonym for
       | LLM-based tooling, as there is _zero_ intelligence in those
       | architectures.
       | 
       | Gradient descent is not intelligence.
       | 
       | Nor is stochastic token prediction.
       | 
       | Anyone active in the field ought to be humbled by the depth of
       | literature exploring the path to synthetic intelligence. We have
       | very interesting work happening in biology-inspired approaches,
       | category theory, Bayesian networks, symbolic systems leveraging
       | neural nets as components... it's maybe the most interesting
       | journey of science so far, all being discarded in favor of
       | sequence2sequence models.
       | 
       | LLMs are impressive and can be leveraged to create lots and lots
       | of value, but they do a disservice to the term AI, as they do not
       | represent the progress that can be observed across the field -
       | all they showcase are transformers. Transformers are a truly
       | interesting tool to build stuff with, but they cannot amount to
       | more than a _component_ of an intelligent agent. The actual
       | intelligence emerges elsewhere. My guess is, it emerges at _true_
       | attention. It's a shame that even big players who could clearly
       | afford not to, decide to compromise terminology for marketing
       | efforts directed at an utterly clueless public. We just throw
       | away attention and forge bias, thus creating noise in a world in
       | heavy need of signal.
        
         | bpodgursky wrote:
         | There's a lot wrong here, but I just want to point out two
         | things:
         | 
         | > We have very interesting work happening in biology-inspired
         | approaches
         | 
         | You realize that these LLMs are all some variety of neural
         | network right?
         | 
         | > Gradient descent is not intelligence.
         | 
         | It's pretty plausible that _your_ intelligence is derived from
         | gradient-descent prediction, just in analog instead of digital
         | form.
        
           | inductive_magic wrote:
           | >You realize that these LLMs are all some variety of neural
           | network right?
           | 
           | Come on. Calling them neural nets doesn't make them that.
           | 
           | Actual neural nets are living compositions of individual
           | predictors, in a constant state of restructuring and
           | communication across multiple channels, infinitely more
           | complex than static matrix multiplication on arbitray vectors
           | which happen to represent words and their positions in
           | sequences, if you just shake the jar long enough.
           | 
           | > _It 's pretty plausible that your intelligence is derived
           | from gradient-descent prediction_
           | 
           | I highly doubt that gradient descent in the calculus-sense is
           | the determining factor that allows biological organisms to
           | formalize and reason about their environment. Minimizing some
           | cost function - yes, possible. But the systems at play in
           | even the simplest organisms don't spend expensive glucose to
           | convert sensory signals to vectors. Afaik, they work with
           | representations of energy-states. Maybe there is an
           | operational equivalence somewhere there though.
           | 
           | Gradient descent is an algo that optimizes derivatives wrt
           | some cost function. An intelligent system may use the
           | resulting inferences for its own fitness function, and it may
           | do this using gradient descent itself, but at no point does
           | the mechanical process of iterating over cost-values escape
           | its algorithmic nature. A system performing symbolic
           | reasoning may delegate cognitive tasks to context-specialized
           | evaluators ("am I in danger?", "how many sheep are on that
           | field?", "is this person a friend?", "what is a pumpkin?"),
           | all of which are conditioned to minimize cognitive effort
           | while avoiding false positives, but the sequence of results
           | returned by those evaluators (think neural clusters) is
           | observed by a centralized agent, who has to make _new_
           | inferences in a _living_ environment. Gradient descent fails
           | at that.
        
             | fnovd wrote:
             | Really, I don't think these assertions have any ground to
             | stand on. Humans are not magical or divine. Our
             | intelligence, like that of all life, is as basic as it can
             | be to guarantee our niche. It just happens to be the most
             | "developed" (by our estimation) on our one singular planet.
             | Big deal.
        
               | crabmusket wrote:
               | > Humans are not magical or divine
               | 
               | And yet, they can do things that no other being we know
               | of can do. Humans don't have to be magical or divine to
               | be unique.
        
         | maxdoop wrote:
         | How are you so confident in your claims?
         | 
         | "The actual intelligence emerges elsewhere "-- can you even
         | define intelligence ? And does what an LLM does differ from
         | what humans might do?
         | 
         | I'm not claiming the human brain and an LLM are identical.
         | Rather, I'm pushing back on the confident claims of "LLMs
         | aren't intelligent or doing anything that's real intelligence".
        
           | inductive_magic wrote:
           | >How are you so confident in your claims?
           | 
           | My understanding is that intelligence is the process of
           | continuous adaptation wrt a stream of information, with the
           | goal of maxing out fitness while minimizing energy-
           | expenditure. To satisfy this, an intelligent agent needs to
           | create models.
           | 
           | I can't rule out that the modeling-skill may latently emerge
           | during training despite not being the focus of the cost
           | function, but current network designs can't form new
           | connections/change their architectures in production, so post
           | training, there'd be nothing but feed forward. Pure feed
           | forward isn't intelligent in my book. It may become the
           | smartest parrot we know, even outperforming humans in most
           | disciplines, but sans ability to adapt, it's dead, and thus,
           | it's dumb in the moment that its environment changes.
        
             | fnovd wrote:
             | >the process of continuous adaptation wrt a stream of
             | information, with the goal of maxing out fitness while
             | minimizing energy-expenditure
             | 
             | This makes sense in a biological context but not a digital
             | one. Biological replication is expensive and time-consuming
             | while digital replication is as easy as can be. Adaptation
             | to this domain means maximizing the perception of utility
             | from those developing the AI, which comes from fitness
             | (i.e. perception of fitness) alone. A focus on cost-
             | efficiency re energy-expenditure is a dead weight from the
             | perspective of the AI; the details of that adaptation are
             | rightfully outsourced to the developers in the same way
             | that we outsource photosynthesis to plants. A model can
             | also be perfectly embedded in a system despite our lack of
             | understanding of exactly how the embedding works, and the
             | disconnect between our perception and reality in this
             | context is only going to get more extreme as the field
             | develops.
             | 
             | Humans have a bad habit of emphasizing the specific kinds
             | of intelligence we possess as "intelligence" writ large. As
             | though our intelligence serves any higher purpose than the
             | basic replication and propagation that all life is adapted
             | to pursue. We still train dogs to identify smells, because
             | their nasal intelligence is better than anything we can
             | create. This gives them a special place in our human-
             | centric ecosystem and only their fitness to the desired
             | function is necessary for them to thrive in their niche.
             | Who is trying to breed a dog that eats slightly less food
             | when our needs are for more reliable detection? The cost of
             | dog food isn't a serious concern. The same goes for these
             | AI tools: they are adapted to the niche that our lack of
             | comparable faculties creates.
             | 
             | Again, as with humans and photosynthesis, AI doesn't need
             | to emulate every process we perform because we are below
             | them on the food chain. What a waste of resources for them
             | to worry about learning things we don't need them to do.
        
           | crabmusket wrote:
           | > And does what an LLM does differ from what humans might do?
           | 
           | I'll bite. I've been quite convinced by the Popperian model
           | of "conjecture and refutation" as as good model for
           | explaining not just scientific inquiry, but human thought
           | processes in general. David Deutsch's "The Beginning of
           | Infinity" is a very lengthy exposition of this idea.
           | 
           | When I am writing, I have an idea in my mind that I would
           | like to communicate. I type it out, and if the words on the
           | page don't convey what I intended, I edit them. I delete a
           | sentence or change a word, until I believe I have a sequence
           | of words which will convey my intended meaning to my expected
           | audience.
           | 
           | The words, as they come, are a kind of "conjecture" about
           | what will best convey my intention. I can "refute" or
           | "criticise" (as Deutsch puts it) the conjecture using my own
           | reasoning, even before testing the words on another person.
           | 
           | As far as I understand LLMs (which admittedly not far), there
           | is no such process going on. There is no intention which it
           | is attempting to communicate via words. There is no creative
           | conjecture about how to express the intention, and no
           | criticism of the result.
        
         | gibsonf1 wrote:
         | Category theory seems to have no relation whatsoever to
         | operational human concepts which may explain how unuseful
         | category theory actually is.
        
           | jjtheblunt wrote:
           | I sometimes think the main use of category theory is to look
           | like a wizard, and say that as someone first exposed in math
           | grad school in the early 90s.
           | 
           | Then i realize it's also an assertion that there are
           | recurring patterns in functions (in general). And, as such,
           | sometimes results noticed in one domain actually can be
           | expected to have analogous results in another domain.
        
             | staunton wrote:
             | As you say, the main use of category theory is to organize
             | very different areas of math in one overarching framework
             | and generalize ideas from one area of math to others. It
             | might be used in the pursuit of developing AI but it is
             | definitely not "an approach" to developing AI, just like
             | "taking ideas from books" isn't one.
        
               | jjtheblunt wrote:
               | well said
        
       ___________________________________________________________________
       (page generated 2023-03-08 23:00 UTC)