[HN Gopher] AI is making it easier to create more noise, when al... ___________________________________________________________________ AI is making it easier to create more noise, when all I want is good search Author : saeedesmaili Score : 305 points Date : 2023-03-08 15:03 UTC (7 hours ago) (HTM) web link (rachsmith.com) (TXT) w3m dump (rachsmith.com) | citizenpaul wrote: | The only solution is the death of exponential growth and easy | investment money. | | Probably ultimately the technical solution will be some sort of | variation on PGP key signing parties. No way to get 10k users per | financial period with that real world friction though. | hexage1814 wrote: | I think a big problem is how closed and walled the platforms | become. For instance, search engines don't have access to the | whole twitter database, Google Images don't index instagram posts | and so and so forth. | barbariangrunge wrote: | The Seo spam has already made it hard to find what you want on | google. Seo spam + ai is going to be a dumpster fire. Whoever | solves this problem will probably get to be the new google | | Can't wait to see how this affects hn comments over the next 5 | years | mjevans wrote: | Back of napkin approach: | | Allow-list some sites that already have a good reputation for | useful results (probably anything scraped for summary snips). | | Penalize all other sites based on the quantity of ads and | similar non-content. | | Use the old web search core on what's left. | ssklash wrote: | Kagi lets you do exactly this. You can allow/block list | certain site results, or create a curated list of domains to | include results from. | guluarte wrote: | sadly closed gardens like reddit, human moderation is the way | to go | avereveard wrote: | > Can't wait to see how this affects hn comments over the next | 5 years | | I think new form of comment ranking will take over, all those | system we're building to detect ai spam will also double as | scoring system to evaluate content originality. the problem | will be to differentiate good original content and bad original | content, but that can be left to flagging systems. | coldtea wrote: | > _Whoever solves this problem will probably get to be the new | google_ | | If it's solvable. There are many problems that aren't. | | Perhaps the new Google would just be a 1996-era Yahoo! human- | curated catalog. | philote wrote: | I've often wanted something like that yahoo catalog. I miss | the days of being able to browse categories of sites. | tremon wrote: | My hope is for something like YaCy with a web-of-trust | overlay. I'd want it to only show results from peers that I | trust, and if garbage starts showing up in my search | results, I want to know from which peer it came so I can | un-trust it. | | Which in my mind is somewhat of a middle ground between a | search engine and a directory: what I'd want is a whitelist | of curated directories that contain domains to be crawled | by my search engine. | izzydata wrote: | My solution is to go back to meat space and physical libraries | of information. | JohnFen wrote: | It's seriously hard to beat libraries. For a while, it looked | like the internet was making them less instrumental, but I | think the balance started tipping back toward them a few | years back. | lunfard000 wrote: | Websites are seo-ing Reddit now (shit things like: Top ten | airfryer recommended by Reddit), so appending reddit is not | enough anymore. In a very close future we are going to have to | type the whole site:reddit.com. | whatevaa wrote: | Well, I already do that today. Probably won't help too much, | though, cause AI will generate some legit looking | posts/comments and you will have a hard time figuring out if | they are written by a bot or person. | Pxtl wrote: | Exactly. I keep coming back to the fact that, between all | the different ways AI can generate content, the entire | user-contributed-internet is about to be completely flooded | with amicable and true-sounding nonsense with a peppering | of whatever viewpoint the bot-runner wants to push. | | I honestly don't see any way out besides Digital ID -- if a | person has to provide the host with information about who | they really are, then once they're discovered the host can | actually prevent further abuse. Otherwise, a single person | can create infinite bots to shill whatever product or | viewpoint they want. | | Which is why I have conspiracy theories about the | conspiracy theories about digital ID. The same actors who | use the internet to push misinformation benefit from | anonymity. | pictureofabear wrote: | It's almost like the Matrix but in cyberspace where | humans are constantly trying to escape the bots, except | instead of mining the humans for energy, the bots are | mining data. | nico wrote: | Everyday I'm using Google search less and less, while using | ChatGPT more and more. | | It's pretty rare that I want to find a website or document. | Most of the times I want an answer or a solution, and ChatGPT | is so much better at that than google. | | The ChatGPT user-experience is mind-blowing, but when you start | using their API and see, partly, how the sausage is made, you | realize that the context-awareness is just a very well done | illusion. But that's exactly where the magic of the experience | is, and what gives ChatGPT it's edge. | | Google should just launch a "chatux" version of their search. | bagacrap wrote: | Why were you using a web search if you didn't want to access | the web (find a website)? | | Openai says don't use chatgpt for search. Microsoft says | double check everything Sydney tells you. Is there an | argument for LLMs replacing search besides laziness? | imachine1980_ wrote: | If this works as we wish(no saying that is posible), this | will reduce the amount of seo spam trash you read to 0%, | will allow to investigate things faster, and will allow to | have more recursive way to expore topics. And this will be | key to make ok google more similar at what we wish as ai | companion | napier wrote: | Have you tried Bingchat yet? It's rumoured to be using a more | advanced model from OAI than ChatGPT or Davinci and is very | often surprisingly excellent. | beauzero wrote: | I was pretty decent early on. Now I just use it to augment | copilot...generally when I don't have a specific "start" | but just an idea...works better than dumping a comment into | copilot. I was hoping with the 3 tabs creative -> more | exact it would be a little better but I haven't seen any | difference. It isn't quite as interesting as when it was | regularly hallucinating. I found the hallucinations to | bubble up inspiration...forgotten facts, unused paths, etc. | nico wrote: | Keep reading great things about it. Will have to try it. | Thank you for the recommendation! | wetpaws wrote: | It is so low heavily lobotomized that 99% I find much | better results with chatgpt even though it is technically | an older model. | nirav72 wrote: | I've noticed the same thing. Other than generating code | examples - I found it be useless for search. | | I've had better luck using a third-party extension that | inserts web search context into the ChatGPT than using | Bing chat to search for something. | nico wrote: | Could you mention or link the extension here? Thank you! | nirav72 wrote: | Sure. The one I'm using is WebChatGPT: | | https://chrome.google.com/webstore/detail/webchatgpt- | chatgpt... | | Firefox: https://addons.mozilla.org/en- | US/firefox/addon/web-chatgpt/?... | | note - I haven't tried the firefox version. But the | chrome one works fine in Brave. | | Edit: Source for the above extension: | https://github.com/qunash/chatgpt-advanced | sangnoir wrote: | > Everyday I'm using Google search less and less, while using | ChatGPT more and more. | | Wasn't ChatGPT3 trained on web content with a cutoff of 2021? | Future ChatGPT versions will likely be trained on data | tainted by AI-generated fluff, and will face the same | challenges Google is facing with today's web content. | whstl wrote: | This is already a problem with "content marketing" | companies. | | Most of those SEO-spam blogposts are based on older | internet content, and written by non-expert copywriters | that can write about 10 different subjects on a given day. | A lot of the work is just information compilation and | rephrasing, taking some items from a few Buzzfeed lists and | changing the text a bit. Some stuff like SEO-spam medical | advice can be downright dangerous if done without care. And | a lot of companies in this field have been experimenting | with AI for at least 10 years. | | The problem however is that they require sources, be it | freelancers of AI. And there is so much corporate blogspam | today, especially in the areas that use content marketing, | that it is becoming harder and harder to find "first | generation" content written by an actual expert. Google | isn't helping by prioritizing newer content. | coldtea wrote: | And at an even worse scale, because soon AI-created SPAM | will hugely dominate actual content... | jmbwell wrote: | And when this serpent starts eating its tail and the AI | starts returning highly-ranked results consisting of AI- | generated content, ... well, I hope I can move to a farm by | then. | SketchySeaBeast wrote: | Hmm, maybe the singularity won't be a time when the AI | take over, but instead when a single element's | hallucination causes a cascade failure that brings it all | down. | danielvaughn wrote: | This is exactly what I think will happen. The internet is | quickly going to fill up with AI garbage, and it's going | to be more and more difficult for AI to source new human | content to feed into it's own model. | nirav72 wrote: | It kind of already is. Self-published ebooks on amazon's | kindle store are currently flooded with garbage written | using ChatGPT. | nico wrote: | It's been like that since humans have been around. That's | the way knowledge spreads. | | We humans have been repeating and imitating ourselves | forever. That's even the way babies learn to talk. | | And we are not all clones, we don't all think, say or do | exactly the same things. But most just repeat and consume | the same content and ideas (movies, music, books, | languages, media). | tremon wrote: | _That's even the way babies learn to talk_ | | No, it's not. Babies learn to talk by babbling, and then | by taking cues from their parents which babbles elicit a | response and which don't. If you want to translate that | to AI, it would mean that the AI would spew random | garbage and then learn to filter its garbage from the | responses to its outputs. That's pretty far away from | what ChatGPT and other language models are doing right | now, because they stop learning before they start | producing any output. | mfcl wrote: | (Why are you being downvoted?) | | I think the important part is to have a minimum of | filtering. Humans consume knowledge brought by others | humans, but most of the time we cherry pick the true and | useful knowledge and reject what turns out to be false | (ideally). | | I think what AI brings to the table is automation and | speed, but the quality is not better. So if AI starts | consuming its own content, will that decrease the quality | of knowledge* in general? | | (*Here I mean the first knowledge you quickly get from a | search or asking some AI, not what you could get after | hours/days of research.) | nuancebydefault wrote: | HN crowd is so very skeptical about the usefulness of | a.i., especially related to content generation. I share | the skepticism (the positive and hence out of control | feedback loop of ai consuming its own content | recursively). But on the other hand, when smart people | enhance the primitives of the a.i. algorithms, a.i. might | become better at rating the quality of its own output. | Not by means of the question, 'does this look like | anything found on the internet' , rather by the question, | 'does this make sense, is this even possible, do the | numbers add up' ? Also, it is not very hard to "freeze" a | pre-ai knowledge base or NN, and use that as a reference | for sensibility. | coldtea wrote: | > _But on the other hand, when smart people enhance the | primitives of the a.i. algorithms, a.i. might become | better at rating the quality of its own output_ | | All incentives of massive industries like SPAM, "content | creation", "news" publishing, and advertising, are | against it becoming better at rating the quality of its | output - or rather, just have it become better at being | undetactable but still a cheap fast mass produced wall of | text... | coldtea wrote: | >It's been like that since humans have been around. | That's the way knowledge spreads. We humans have been | repeating and imitating ourselves forever. | | Humans repeating and imitating others has historically | been a "volatile memory" keep-alive mechanism. Basically | saving culture and knowledge in people's minds and | helping it transfer, as literacy was low and (manual) | writing and book copying scarce, expensive, and time- | consuming. | | Even when, with the advent of typography writing was | easier to reproduce, but still somewhat costly (cost of | materials, typesetting, distribution, etc.) and | gatekeepers (publishing houses, bookstores, etc.) ensured | that most stuff is somewhat original, not just copies or | random permutations of the same content. Indexing was | also costly manual labor (creating dictionaries, curated | bibliographies, books with oversight on a subject matter | and references to what the main science/wisdom/etc about | that is, library collections, etc.) | | Now, however, SPAM and AI-SPAM is on mostly permanent | storage - and permanent storage, indexing, and | duplication/reproduction costs close to zero and happens | automatically at huge scale. | | So, no, it's not the same thing. The same way breaking a | quick little wind is not the same as having full-blown | Taco-Bell-inspired diarhea. | [deleted] | sam0x17 wrote: | Yeah someone needs to figure out how to get a continuously | trained online model up and running without running the | risk of people figuring out how to exploit it | krupan wrote: | Is this actually giving you good results? The few times I | tried to use chatGPT instead of google I got terrible | results: | | - Horrible and wrong driving directions | | - Buggy incomplete code | | - long winded explanations of how it was just a chat AI and | not a whatever whatever blah blah blah | | I found it to be tedious and incredibly untrustworthy | Groxx wrote: | It'll probably look something like this. | userbinator wrote: | Does ChatGPT know things like part numbers for mid-80s ICs by | Japanese manufacturers? | | That's what I use search for. Unfortunately Google is getting | worse at those queries, but I doubt AI is going to be better. | | Beware of wanting "answers" or "solutions" --- that's a | slippery slope towards complacency and loss of agency, | replaced by corporate subservience. Classic example: instead | of finding a service manual or discussions on repairing | something, AI may try to convince you to buy a new one. | gmadsen wrote: | that seems like exactly something bing could give you | ToValueFunfetti wrote: | ChatGPT does surprisingly well at this task. My question | was very biased towards it knowing the answer because I | asked it for examples of mid-80s Japanese ICs when google | failed me, but it definitely knows a ton of model numbers | for real parts from the era. Give it a shot with a real- | world example and report back | squarefoot wrote: | If you are looking for data sheets, the Internet Archive | contains some. Example: https://archive.org/search?query=su | bject%3Atoshiba+integrate... | | Also, a Google search followed by "site:archive.org" will | filter out everything not coming from archive.org, and | "filetype:pdf" will return only pdf files. | | Sometimes it can be more effective a search for images, so | that one can recognize the target book by the cover. That | can be especially effective with old titles that were | scanned but not OCR'ed where the file name is all one can | search for, so that ambiguous part names (for example ICs | named like airplane flights) can be easily recognized. | jsenn wrote: | > that the context-awareness is just a very well done | illusion | | Could you expand on this? The context awareness is the part | that blows my mind. I would be disappointed/fascinated to | find out that it's "just" a simple set of tricks! | nico wrote: | ChatGPT is "only" a chat interface to GPT. | | GPT basically has access to two contexts: its internals, | and the prompt that it gets | | Then when you ask something to ChatGPT, it takes your | prompt and it generates a new prompt that includes the | previous messages in your session, so that GPT can use them | as context. | | But, there's a limit to the size of the prompt. And it's | not that big. | | So then ChatGPT's magic is figuring out how to crafts | prompts, within the size limits, to feed GPT, so that it | has enough context to give a good answer. | | Essentially, ChatGPT is some really amazing prompt- | engineering system with a great interface. | jsenn wrote: | That makes sense. So ChatGPT has to use some cleverness | to present the relevant previous text in the current | prompt, but the language model is still responsible for | pulling out the meaning of that context and responding | appropriately. I remain mind-blown :) | nico wrote: | Exactly. Very eloquently put. | nirav72 wrote: | >while using ChatGPT more and more. | | I'm assuming you're using ChatGPT to validate/generate | technical solutions such as code and not necessarily | searching for specific information? If its for information | search, then how do you deal with the fact that at times it | tends to make up a things that are factually incorrect or | logically inconsistent? | ruszki wrote: | By using Google to verify. It's still okayish to find out | whether a statement is true or not. The problem is that | Google is becoming worse to find that statement. For | example, I wanted to figure out how I should replace a | given dependency. Google was terrible, so was | Stackoverflow. ChatGPT gave me a guess. For such questions | it's quite right in about 3/4th of the queries, and when I | search for the given guess Google finds it immediately | whether it's fine or not. | passion__desire wrote: | [0] : "You can try doing a web search with the URL, but | Frankly, I haven't had much success with that." | | In my experience, Bing is the best search engine for finding | info on deleted videos - for example: | bing.com/search?q=youtu.be/t1wjL4BqXlI 1st result shows title | of video: "Awolnation "Sail" - Unlimited Gravity Remix" | | [1] Daniel Dennett talking about why we should still support | explicit string search. | | [0] https://support.google.com/youtube/thread/3876476/how-to- | fin... | | [1] https://youtu.be/arEvPIhOLyQ?t=1330 | stockhorn wrote: | How about letting the users flag each website with a label "seo | spam", "useful", " fakenews", "inaccurate"... and display a | summary of the votes as a badge beneath each link in the search | results? | | We could create a browser extension to add this functionality | to all search engines at once. Using a web of trust mechanism | to ensure that we only get real votes... | | https://news.ycombinator.com/item?id=34999285 | Veen wrote: | Wouldn't work. You'd have "SEO professionals" selling | automated upvote, downvote, and labelling botnet services | almost immediately. Nothing that can be gamed is reliable at | scale. | pushfoo wrote: | Spammers would submit fake tags faster than everyday users. | Since this is a social problem rather than a technical one, | maybe old-school trust networks are worth considering | instead. | bnjms wrote: | I assume implicitly that votes only count for users two | degrees of separation from you and it's tied to real life | friends registry. Wish FB hasn't happened and the web of | trust stuff turned into something useful. | pushfoo wrote: | I don't think real life is the main issue. Instead, it | seems like the same issue PGP has: the benefits never | outweighed the bad UX for most people. | xgbi wrote: | Tie this to a proof of << browse >> and a blockchain to weed | out the big players and you got yourself a business model ! | voz_ wrote: | Blockchain is garbage and totally irrelevant here. | originalcopying wrote: | no its not. blockchain is a legitimate innovation looking | for an real world use case. | | blockchain technology's biggest problem is the fact that | its use cases touch the ideological foundations of our | global culture. And bitcoin operated on the foundational | level of any all governments in the world. | | Bitcoin is just one use case of the technology, another | is land registries. Again, foundational institutions of | society; not the kind of thing that has ever been | peacefully reformed ever. | | Your comment is evidence that 'cultural directors' of our | civilization decided to destroy this technology. However | it's funny to notice that they will proceed to implement | their own versions of it: possibly the renminbi unless | the dollars bomb them out of existence? but I'm so far | into guesswork that this whole comment ought to be voted | into the really light grays. | originalcopying wrote: | [flagged] | Dalewyn wrote: | Steam has user tags. | | The last time I referred to them was to tag Elite: Dangerous | Odyssey as "Early Access". | wilsonnb3 wrote: | We're already drowning in an ocean of SEO bullshit without help | from AI. Adding another ocean isn't really going to change | anything IMO. | tomxor wrote: | > What I would use the shit out of, though, is a chatbot that has | been trained on all the information in the CodePen knowledge | base. Have it suck in all the meeting notes. | | Yeah, it's pretty good at this, I've recently been hitting up | OpenAI's chatGPT with some queries that were too exhausting to | extract from google - and it's pretty good at surfacing resources | quickly - and the workflow of refining the query with the context | of the current thread of conversation works really well when you | are struggling to succinctly describe it in a single shot (which | google search doesn't really do beyond the global context of your | profile - which can actually be counter productive). | | I really hate all the hype around chatGPT, it can't be trusted | for a lot of stuff, people over anthropomorphise it, but so long | as you don't rely on it for accuracy, it's pretty useful for | search. | | One major issue I found is that ~95% of the time it can't provide | correct links to sources, this is fine if it can just name stuff | - then you can follow it up with a more specific google search. | But chatGPT will just make up bullshit links, in the same way it | will wax poetic some BS explanation to hit it's "looks correct" | training. You can even point this out and it will keep generating | variants on the URLs that are all fabricated. | | It kind of makes sense that it would be good at search... it's a | language model, it should be able to link descriptions of | difficult to search for things to known resources. | mark_l_watson wrote: | I agree that having text generated inside a tool like Notion | might not be everyone's use case. The author mentions a chat bot | trained in internal documentation (fairly easy using GPT-3 APIs, | LangChain, and GPT-Index/Llama-Index - I am writing a book on the | topic https://leanpub.com/langchain). | | I read that 100 companies released products using ChatGPT APIs | the first week the APIs were publicly released. I expect a lot of | useless and also a lot of very useful products. A little off | topic, but Salesforce Ventures just announced a $250 million fund | for generative AI startups | https://www.salesforce.com/news/stories/generative-ai-invest... | O__________O wrote: | Possible missing something, but custom GPT + index will not | solve the problem of the users knowing what to prompt to use to | access the information. Is there something I am missing? | gibsonf1 wrote: | Actually, not very easy as the LLMs don't give you sources and | literally make things up by design. | politician wrote: | Only about 10% of the links I ask for don't resolve. | sometdog wrote: | That is true, but you can combine LLMs with other tools to | get sourcing and more accurate answers overall. Instead of | using the LLM to directly answer a question, you can use the | question to search for relevant text in a particular | knowledgebase, and then use the LLM to summarize those | results. | DebtDeflation wrote: | >use the question to search for relevant text in a | particular knowledgebase, and then use the LLM to summarize | those results | | As a consultant, based on personal experience, I can say | that what you wrote above constitutes >90% of current | enterprise use cases for ChatGPT. What clients REALLY want | to do is be able to take a pre-trained LLM and then train | it further on their own corpus of documents, but given | limitations around token window size, the above is probably | the best way to fake it for now. | gibsonf1 wrote: | If you use the LLM to summarize the results, it makes | things up by design. As soon as you introduce the LLM into | search, you lose. | raincole wrote: | I've personally replaced Google with Bing Chat for | technical things (like searching for specific API). Does | it make things up? Maybe. But in my experience it never | happes even once in past whole week (>100 times of | searchs). | | It's not "it happened but you just didn't notice..." if | it uses a function call wrong I'd have noticed. My code | won't compile. My test won't pass. | | So far it either gives me 100% correct result, or | completely fails. But it doesn't generate "seemingly | correct but actually wrong" things even once, unlike | ChatGPT. | autonomousErwin wrote: | I find this the killer application for ChatGPT (at least | for now). Answers you can _very_ quickly verify and care | little about the sources because a significant no. of | answers on Stack Overflow make ChatGPT look modest in | confidence comparison | throwuwu wrote: | Biased much? | jsheard wrote: | They do give sources if you ask for them... but have a habit | of inventing plausible sounding but completely fake sources. | | https://news.ycombinator.com/item?id=33841672 | | That example is a few months old but OpenAI doesn't seem to | have made much progress here, if you ask ChatGPT for "a list | of academic papers about X" and it will nearly always | confidently churn out a list of 5-10 papers that don't exist. | Amusingly if you ask it for papers about an absurd premise it | will sometimes call out the absurdity and say there are | probably no papers on that subject, but then offer a more | plausible variation on that premise and trip up at the last | hurdle by inventing all the examples on the subject it | supposedly thinks is more likely to really exist. | aledalgrande wrote: | I got so many links to github repos which didn't exist lol | visarga wrote: | Somebody went as far as contacting the author of a paper | based on chatGPT suggestion. Of course the author was | pretty sure he never heard of that paper. | jerf wrote: | "They do give sources if you ask for them... but have a | habit of inventing plausible sounding but completely fake | sources." | | Basically, the way you want to think about it is, no, they | can not give sources. That information is not in their | neural net. It can't be. There isn't anywhere to encode or | represent it. | | What they can do is make a guess what a source might look | like, but even if they are right, it is only because they | happened to guess correctly, not because they knew the | source. They don't. They can't. | | It isn't that "they give sources but they might be wrong", | it is "they will make up plausible-sounding sources if you | ask just as they'll make up plausible-sounding anything | else you ask for, and there's a small chance they'll be | right by luck". For more normal factual-type questions part | of the reason they are useful is that there's a good chance | they'll be right by what is still essentially luck, but for | sources in particular there's a particularly small chance, | by the nature of the thing. | sam0x17 wrote: | That said with enough training and training data, the | line between "plausible sounding" and "accurate" gets | thinner and thinner. This will be especially true as | these AI models refine their results based on user | interactions. Being right for the wrong reasons becomes | less and less relevant the higher the accuracy goes up, | and at a certain point, it might get so good that no one | cares. | | Maybe human intelligence is more like that than we're | willing to admit ;) | jerf wrote: | Source identification isn't a space amenable to guessing, | no matter how much data you throw at it. | | Here's an exercise you can try: Cite some information | from the next issue of Science to be published. Cite | anything you like from it. | | You can make some plausible stuff up. You could make even | more plausible stuff up if you went and scanned over the | past few issues first. But without specific knowledge of | the contents of the next issue, you aren't going to be | able to create real citations. This is what LLMs lack, by | their nature. It's not a criticism, it's a description. | | You can't guess sources. The possibility space is too | large, the distribution too pathological, and the | criteria for being correct too precise. | | GPT will never cite sources correctly. Some future AI | that uses GPT as a component, but isn't entirely made out | of a language model, will be able to, by pulling it out | of the non-GPT component. Maybe it'll need to be built as | an explicit feature, maybe it won't, only time can tell. | But expecting language models to cite sources correctly | is not sensible. It's just not a thing they can do. | gibsonf1 wrote: | Right, they just make all things up by design, no matter | what you ask about. There is a statistical chance that a | pattern of words output may correspond to fact but just as | a good a chance not. The LLM's literally know nothing about | the world, its just statistical word pattern output. | mrguyorama wrote: | Lies and falsehoods are just as valid sentences as the | truth. | mark_l_watson wrote: | The web demo version of ChatGPT can sometimes, as you say, | make things up! I asked it a question about an ancient war in | Greece and it stated two facts and then just inventing stuff. | | However, I would ask you to also consider something: when you | supply prompt input text (this is the "context text" in the | ChatGPT API calls) and then ask questions about the context | text, it is very accurate. It also does a very good job when | you give it context text from a few different sources, and it | integrates them nicely. | | It is more efficient to use embeddings for context text, as | Llama-Index does. | nico wrote: | Do you have shareable links to good tutorials about using | embeddings with ChatGPT for this case? | [deleted] | neovive wrote: | The books sounds very interesting. I never worked with | LlamaIndex and now I'm intrigued by the possibilities! | DontchaKnowit wrote: | I think the biggest possible upside of AI is that there will be a | brief money grab where people are raking in money on | automatically generated internet content - then the internet will | be filled with useless noise and no one will use it in the | obsessive, drip-fed-dopamine-IV way we currently use it- cause it | will all just be garbage. Social media will continue to exist, | but consuming content will require trust in the publisher e.g. | tied to a real identity or to a trusted anonymous or business | entity. | | Just my speculation, but I think the more garbage we put on the | internet the better | luxuryballs wrote: | why was this so hard to read? maybe use the AI next time :D | jamesgill wrote: | I was just thinking this yesterday. I realized that I always | thought the promise of AI was to cut through the noise for us and | bring us signal. | | As I watch the ChatGPT-like products proliferate, I'm realizing | the opposite will be true. And soon, it'll be AIs to help us deal | with AIs. A layer cake of clever but useless 'tools' for | improving human life. | rchaud wrote: | > If you ask me, if there's one thing we don't need more of on | the internet, it is more soulless content written for "SEO" | purposes, with enough wordcount to inject ads between. | | But this is precisely what people will pay money for! Companies | like Canva are 'unicorns' because people need faster ways to | churn out more templated digital detritus to grab your attention | with. | rootusrootus wrote: | AI, the cause of, and solution to all of our problems. At least | on the Internet. | seydor wrote: | AI is making the value of noise go to $0. There is no more SEO | signals that google can "optimize". It will be forced to switch | to AI results and somehow monetize that one, because its old | business model is over | all2 wrote: | AI essentially raises the publishing wall for "noise". Print | books used to be like this until the advent of single-unit/low- | volume publishing. We're essentially returning to a time when | the walled garden of content creators will be the most valuable | game. Substack and communities will likely grow as search | results and AI make "standard search" useless. | fdgsdfogijq wrote: | I cant wait to watch this play out | twelve40 wrote: | well AI is not magic and while it might be better at, say, | summarizing some documentation, the moment you hit a popular | commercial query or something not as clear-cut as programming | documentation, or something gameable, it will have to draw upon | the same shitty sources as everyone else: random sources of | data that have to be somehow ranked and filtered for noise. And | how does it solve that problem? | visarga wrote: | Don't be quick to write off AI for that task | | > Discovering Latent Knowledge in Language Models Without | Supervision | | https://arxiv.org/abs/2212.03827 | | They find a direction in activation space that satisfies | logical consistency properties, such as that a statement and | its negation have opposite truth values. | thanatropism wrote: | Obsidian vaults are just folders of .md files and huggingface | provides a great `sentence-transformers` package which allows you | to easily to k-neighbors search on BERT embeddings of your query | and vault. This is a weekend project really, and that considering | a streamlit or tk frontend as well. | bloudermilk wrote: | I'm just starting down this path myself. Any resources outside | of official HF docs you would recommend? | leobg wrote: | sbert.net is all you need. | Kiro wrote: | The title makes it sound like those are mutually exclusive but I | thought there was a bunch of services doing exactly what they are | describing, all built on ChatGPT. | | E.g. https://ingestai.io/ | runlevel1 wrote: | We've just invented the hammer. | | It can do a lot of cool things! You can build a house with it, | you can smith metal with it, and you can even use it as a weapon. | | The thing is, right now, we're so amazed by its potential that | we're finding a lot of uses that, while technically possible, | aren't a great fit. | | _Technically_ you can use the hammer as an axe, a hole digger, | and a backscratcher, but there are far better tools for the job. | jfengel wrote: | It's unclear to me that there is any use for the "hammer" of | text generation. It adds no new knowledge, and doesn't pretend | to. Its transformations of existing knowledge are neither | interesting nor attractive. Anything they say, has already been | better said elsewhere. | | I can imagine uses for generated art, which may at least be | aesthetically pleasing. But I can't conceive of any end for | computer generated text. | flatline wrote: | Its use is very much in question - but it is certainly a | powerful tool, and that combination is worrisome. Much like | the social graph, it is going to have a profound impact on | how we interact online and with each other, and that impact | will not be known for some time, even though we may be | feeling it now, unawares. In a decade, maybe less, we will | have some picture of the use and power of these models, there | will be meetings in front of congress on how tech companies | have used them, etc. | | Just look at what is happening in education right now. It is | ultimately going to force a complete reinvention of the | written assignment. This is just the beginning, even if the | tool appears to be a mostly-useless toy for any real-world | applications. | wilsonnb3 wrote: | > It is ultimately going to force a complete reinvention of | the written assignment. | | If by complete reinvention you mean returning to what we | used to do, which is write essays with a pencil during | class without using a computer. | | > This is just the beginning, even if the tool appears to | be a mostly-useless toy for any real-world applications. | | It is not possible to tell on this side where LLMs (or any | invention) fall on the spectrum of 3D tv to the smartphone. | It will become apparent in the future and 50% of us will | have been wrong but anyone who claims to know is just | BSing. | jfengel wrote: | The fact that it has a notable downside is certainly | interesting. School assignments are well suited to LLMs | because they also don't present new information. That's not | what they're for; they're for assessing what the student | knows. | | They're usually fairly obvious, but it's hard to prove. | Unlike much ordinary copypaste plagiarism, you can't | trivially reject it as cheating. That forces teachers to | think of new ways to test student knowledge... an | interesting challenge, if not exactly a "use". | jamesdepp wrote: | Personally, I think that the beauty of current LLMs are their | ability to process and present information. Although current | generation LLMs might not be best suited to making new | discoveries or producing new (valuable) information, their | ability to summarize and process information already out in | the open is undeniably valuable. | lunfard000 wrote: | I started to do some cpp development again and google is just | giving me wrong/outdated solutions or just unanswered | question from SO, while ChatGPT is on point most of the | times. | Hendrikto wrote: | Translation and summarization are just two examples of use | cases for which text generation is suited great. | l33t233372 wrote: | > Anything they say, has already been better said elsewhere. | | I don't understand how someone can say this | | There have probably never been poems written that explain the | particular niche physical phenomenon that I've had GPT-3 | generate for me. | RC_ITR wrote: | The dirty little secret of Large Language Models is how many | humans are in the loop.[0][1] | | Transformers are great at building extremely complex maps of | language without any human interventions, but if you want them to | consistently query the right part of the map (e.g., His Codepen | search example), you need a very non-trivial amount of human | feedback. | | Will be interesting to see if all this hype leads to a solution | that scales better than what we do now (so that orgs. actually | _could_ have insanely good AI chatbots trained on their docs), | but the jury is still _definitely_ out on that. | | [0]https://openai.com/research/learning-from-human-preferences | [1] https://openai.com/research/instruction-following | politician wrote: | Wow - [1] should be considered required reading. OpenAI is | baking in human employee biases during fine-tuning. | | I would want to see the exact set of posed completions and | paired responses. | O__________O wrote: | My understanding is majority of "Reinforcement Learning from | Human Feedback (RLHF)" for OpenAI comes from contractors: | | - https://time.com/6247678/openai-chatgpt-kenya-workers/ | windex wrote: | Normal web search is doomed. The people creating SEO out of | ChatGPT dont realize that I am not going to be searching for | answers by reading page upon page of SEO optimized pages anymore. | I will ask an AI directly, and of late it has been a god send in | terms of code and systems administration. First time right in | most simple cases and queries. | bagacrap wrote: | How will the ai know the answer if it's trained on SEO (AIO?) | content? | JohnFen wrote: | > Normal web search is doomed. | | I hope not. The likes of ChatGPT don't do what I want a search | engine to do. If normal web search is doomed, then how will I | find any good new stuff on the web? | orangepurple wrote: | The SEO out of ChatGPT is for boomers and other clueless | people. The dead internet theory is true. | otabdeveloper4 wrote: | Hope you enjoy endless product placement in your search | queries. | | _> Looks like you want to ask about how to administer your | Kubernetes cluster. While I am an AI and not qualified to give | advice on how to run mission-critical systems, I can heartily | recommend the amazing Azure Managed Kubernetes, which is | Gartner-certified for painless six-sigma reliability!_ | windex wrote: | It isn't doing it right now. | jerf wrote: | The tech that detects that you're asking a racist question | can detect you're asking about Kubernetes or orchestration | in general and serve you up an ad as easily as it serves up | an explanation of why you shouldn't ask racist questions. | It is no different to the AI at all. | kimburgess wrote: | Why imagine when you can experience the future now: | https://future.attejuvonen.fi (from a recent thread here). | spacephysics wrote: | I think this will occur with free AI based search, since it | would be a similar business model to Google (if not more | bias/intrusive) | | The alternative (which I think in todays age vs when google | came out, is more readily accepted by the masses), will most | likely be a subscription-based tool that caters to specific | niches and avoids product placement. | | But to what I think your point is, the further removed one is | from the original docs/content/etc, the more likely/able some | middleman is to inject their own economic/political | incentives, which is of concern. Especially when AI has a | political bias, regardless of where it originates. | mrguyorama wrote: | >most likely be a subscription-based tool that caters to | specific niches and avoids product placement. | | Have you not been paying attention? The expensive | subscription based plan will ALSO have ads and product | placement. These businesses just can't help themselves | twelve40 wrote: | who in their right mind would give up tens of billions of | ad revenue? the ads are coming no matter what. | ilikeatari wrote: | Yes that's possible and probable. However I dream that it | will ba a paid subscription businesses model or at least | non ad driven. | HDThoreaun wrote: | The big tech products might have this, but there will be | competitors that don't. | twelve40 wrote: | there will be non-rent-seeking competitors that will raise | $11B and will have nothing to do with big tech. Got it. | ilkke wrote: | Maybe nation-states? You'd just get a different kind of | ads then I guess. | HDThoreaun wrote: | You won't need $11B to create an LLM in a couple years. | Stable diffusion has created the blueprint for open | source large models. Sure it might be worse in some sense | than the cutting edge ad bloated products, but some | people will take that tradeoff. | MagicMoonlight wrote: | Once a model is trained thats 99% of the work. Open | source models or a hacker leaking a major model will be | enough to compete. | Frotag wrote: | I settled for reading the docs and searching github issues like | a caveman. I wonder if this'll eventually popularize low-key | advertisements in bug reports. Like "I scanned the repo using | this (my company's) tool and found mixed usage of ' and "" | tnzk wrote: | I have seen this once in the issues in sveltejs/svelte | (couldn't find the link though.) It had been closed by | moderators fairly soon, I wonder how long it will take for | the amount of this thing to surpass the capacity of voluntary | efforts. | rchaud wrote: | Keyword stuffing never dies. | dougb5 wrote: | I think the list-of-links Web search paradigm has plenty of | time left, for at least three reasons: (1) A good chunk of a | typical user's needs are to do something, not to find an answer | to a question; (2) Google is lightning fast compared to | present-day chat interfaces; (3) "keywordese" may be an | unnatural input language for search queries, but it's faster | than having a dialog. | | I still make dozens of Google queries per day, and do maybe 1 | or 2 ChatGPT sessions per week, and I'm quite aware of all the | capabilities and deficits of each. I wondered why this was, | until I reflected on the things I actually search for on Google | (Go to https://myactivity.google.com/myactivity and filter to | just show "Search"). This was a useful exercise. What | percentage of your recent queries would have worked well, and | more quickly, on ChatGPT? For me it was less than 1/10... | system2 wrote: | Average user doesn't know about AI. Average user is 99% of the | society. We are the only ones who are doomed. | fdgsdfogijq wrote: | ChatGPT is the fastest growing consumer product in history | bob1029 wrote: | Ive observed the average user go from zero to hero on ChatGPT | in about 30 minutes. | chrgy wrote: | Once AI makes 90% of new content, the only question is only the | qualities of the prompts. This AI we see is still very shallow | and it is only really good at creating the next word or pixel, | but at some point it would converge to AGI. | meindnoch wrote: | In a world of AI-generated trash, _provably_ genuine human-made | content will be more valuable than ever. | h10h10h10 wrote: | Why? I honestly don't care who or 'what' wrote something, as | long as it is useful. In fact, after a certain point there's no | way to actually distinguish human-written content from AI- | written content because it's the same. | | We'll just become used to it and only a few people in HN will | shout to the clouds. | meindnoch wrote: | Currently the bandwidth of online astroturfing is limited by | the human copywriters producing it. In a post-ChatGPT world, | astroturfing will be limited by the number of GPU-seconds one | can pay for, which is going to be orders of magnitude | cheaper. It doesn't matter that AI is capable of producing | useful content, because _useful_ AI-generated content will be | a zero-measure subset of all AI-generated content in the | wild. | | Compare with organic food: is it true that non-organic food | can be healthy? Sure! Then why do people prefer organic food? | Because it's just not worth it for them to figure out which | one is healthy/unhealthy. | | Similarly, it's just not going to he worth the effort to | figure out if some AI-generated content is genuine or trying | to astroturf. If there will be a "certified human" badge, | people will use that as a positive signal. | h10h10h10 wrote: | You're right about that. Dealing with astroturfing will be | an arms race and will require some proof of "humanity" in | social media, forums, etc. | | But, ultimately, I don't buy the idea that astroturfing is | all bad. Or the opposite, that the lack thereof is | necessarily good. I think bombarding with AI content can | have possitive effects, like overwhelming human moderators | who built echo chambers and ban wrongthink (e.g., Reddit). | Or, it has the capacity for a small organization to compete | with the likes of the NYTimes, or The Atlantic, etc which | currently control the narrative. | [deleted] | O__________O wrote: | Author mixes two topics, that being searching content they | control and content they don't control. | | While understand desire to search content created by yourself, in | my opinion vast majority of valuable content is created by | others, in part because the most valuable content created by | yourself is rapidly internalized into your brain. | | Meme of Google is getting worse fails to acknowledge that Google | is free and makes money from advertisers. More to the point, more | valuable the related search is per amount spent on advertising, | the more the noise and as result, more resources user will need | to spend to enhance the signal. This has nothing to do with | Google and everything to do with value of the information itself. | lesuorac wrote: | > More to the point, more valuable the related search is per | amount spent on advertising, the more the noise and as result, | more resources user will need to spend to enhance the signal. | | This sounds like an argument of, The worse Google Search | becomes the more time a user has to spend their and therefore | see more ads. | | But it has a very large hole in that the worse Google Search | becomes the easier it is for a user to switch to Bing/DDG/Apple | Search. It may seem unfathomable that people wouldn't use | Google Search but people felt the same way about Google Maps | which lost a huge moat to Apple Maps/Waze (albeit the latter | was purchased by Google). | dktp wrote: | I think the argument is that whatever search option you | choose to use will inherently have the same problems | | Noise isn't just the paid ads, it's the websites gaming SEO | for valuable searches. And that is not isolated to Google | | > more valuable the related search is per amount spent on | advertising, the more the noise and as result, more resources | user will need to spend to enhance the signal | | Sums it quite well really | neura wrote: | IMHO, you're talking about 2 very different things. I would | guess that the maps difference was due to devices (not sure | if MacOS users even use Apple Maps with any frequency | compared to opening up maps.google.com in a browser, but I'm | guessing most iOS users are too lazy and complacent to use | google maps). Also, it's hard to recognize and understand the | differences in the various mapping apps. | | OTOH, it's pretty easy to tell the difference in the first | few minutes (or first few searches) between various search | sites. Heck, the difference in the search engine having a | better idea of what you're looking for based on previous | searches (or ad related data) alone can make it more | compelling to continue using the same search engine. | | My personal example is that I try to switch to DDG every so | often (maybe several months in between), but I get | dissatisfied with the results and start wondering if I'm just | getting bad at search or Google knows me better or Google is | just better at finding the things that people want in | general. | | Just the fact (for me) that I consider that Google generally | gives me better results makes me wonder if all the talk of | Google search getting worse is just complaints based on | heightened expectations, feelings or the landscape of content | on the internet in general instead of "is Google search | getting worse?" | harimau777 wrote: | I'm not sure that most of those are any better. I think that | DDG is the only one of those that still respects modifiers | like +, -, and "". However, I've had mixed success with it. | It seems like maybe it's web crawler isn't as effective or | something? | | That or maybe Google has poisoned the well so much that even | with a good search engine you can't find good results because | everything is SEOed for Google. | topaz0 wrote: | > rapidly internalized into your brain | | I'm not sure how old you are/how long you have been working at | your job, but I can tell you that over 10 years or so there are | tons of things that I "internalized" for a while, then did | other stuff for a while, and now only have vague memories of. | This is the use case for searching your own content. | visarga wrote: | I'm wondering to this day why web browsers don't run a full | text search index locally. Only saving urls and titles is not | enough. Especially if I spend more than a minute on a page I | would like it indexed. And today being today, I would also | like a LLM on top. | CamperBob2 wrote: | Noting that Google being "free" isn't so "free" when low- | quality or made-up results make my job more difficult and | error-prone, or lead me to make suboptimal decisions in my | personal life. | krupan wrote: | Two words: Butlerian Jihad | gil2rok wrote: | This is precisely what Hebbia AI (https://www.hebbia.ai/) does -- | they let you train a large language model on your own documents | then search them better. | classified wrote: | This will be all the rage, now you can produce ad spam and SEO | crap fully automated, and don't even have to hire copywriters any | more. | fwlr wrote: | Lucky break for the author - seems like a larger context window | is all you need to turn GPT's language skills towards search and | summarize, so that search they want will probably be shipping | soon. | lopatin wrote: | SEO has always been a war. Spammers automate spam. Google comes | up with ingenious ways to combat it. Spammers adapt. | | I keep hearing that Google search is nothing but SEO optimized, | affiliate link riddled content nowadays. I don't disagree. I see | it too. | | But what makes you think that the affiliate like riddled article | is worse then what you would otherwise find if you're searching | for "best office chair" on Google? | | Google got so good at combating traditional SEO spam, that the | only way to "cheat" google is to actually write valuable content | and insert your affiliate links into it. This is what we see now | I think. The spam SEO sites actually provide more value than | random forum conversations. | cyanydeez wrote: | It's not just Google immune system that's making it's search | function useless, it's also that Google wants to advertise to | it's users regardless of what kind of search is being | performed. | deet wrote: | It seems like everyone is reading this article to be about public | web search. | | But what the article is really about is how products are using | new AI tech to add shiny features instead of solving existing, | core problems like search and information retrieval in new and | innovative ways, especially for personal and private data. | | The reason this is happening is pretty easy to explain: the | generative AI and chat demos are a sufficient leap beyond what | was previously possible that people are excited to be on the | frontier of new applications, not just new implementation of | previously known use cases. | | Not to mention that some of the demos have people excited about | the "singularity" being closer than they might have previously | thought (though this can be debated...) and that VCs will shovel | money to you if you want to play around with generative AI even | without a proven use case (slight exaggeration but not much) | | I personally believe that transformers and LLMs do unlock a ton | of new applications, especially when applied in interesting ways | to interesting data, like what is private-to-you or private-to- | your company. For example, LLMs can be used to not just generate | content, but plan sequences of actions including searches, | summarizations, and even calculations (see LangChain agents | https://langchain.readthedocs.io/en/latest/modules/agents.ht... | for an example of how to do this). And this can have real value | for existing, known problems like search. | | People just have to choose to focus on these less-sexy but core | problems | | ... | | PS I'm currently working on a project towards this goal, and if | anyone is interested I'd love to talk (see link in profile). I | believe we can solve much of the author's desire by simply | hooking up the right tech to the right data sources, and doing it | in a privacy preserving way (for example we're running most of | our ML including vector DB, summarization, etc on device) and | then present that info at the right time (ie in your OS) | Zetice wrote: | I've said before and I'll say it again; the startups that find | ways to use AI to solve problems without having to advertise | that they're using AI will be the winners in the long run. | pydry wrote: | The more noise there is the easier it is to plug in advertising | :/ | adamch wrote: | I really like the design of this blog. It's clean and simple but | has enough character from the carefully-chosen decorations. The | rainbow hyperlinks are a nice touch! Great styling. | mishu2 wrote: | I still use search but am concerned about the day when all | results will just be SEO AI-generated garbage (for some searches | that is already the case). | | So I recently started https://cstdn.org/ and am sharing all good | links I can find there. I created a Show HN but didn't make the | front page, but anyone who wants to try it out is more than | welcome to do so. | MagicMoonlight wrote: | ChatGPT is really good if you don't need specific facts. It can't | tell you what 1+1 is. But ask it to write you an apology letter | and it's a god. | | I think most websites are screwed. For facts I can go to | wikipedia etc but for answers now I can go to an AI. I don't need | to search reddit or anything because the AI is really good at | giving me human like answers to problems. | naillo wrote: | Well yes and no. Image 'search' has kinda been radically improved | via stable diffusion. It can help you find things that search | never would have enabled you to (e.g. because of 'anti biasing' | or because the search just isn't good enough). | okhuman wrote: | Why couldn't we take something like | https://github.com/mckaywrigley/paul-graham-gpt but more general | purpose for a doc site? Would that approach to a chat bot trained | in product documentation work? | aj7 wrote: | "What I would use the shit out of, though, is a chatbot that has | been trained on all the information in the CodePen knowledge | base." | | Of course. Whether you're operating a 777, a refrigerator, or a | dildo, narrow but exhaustively trained ChatBots are the killer | app. This is worth in the $T range. | bloppe wrote: | The secret to good AI is massive amounts of quality data. To get | massive amounts of quality data, OpenAI and others employ massive | armies of data curators. It's possible that massive armies of | data curators could play a role in de-noising the internet of the | outputs of the very models they helped create, either directly or | indirectly by helping to train new models to detect AI noise. | Hard to predict how the adversarial AI noise race will play out, | but interesting to think that these epic noise machines were | themselves created by armies of humans working to remove noise | from the internet. | michaelbrave wrote: | I don't think search is bad because the tech can't do it, I think | it's bad because they are maximizing profit from advertising and | because SEO got gamed by marketer's too hard. | pphysch wrote: | With zero curation, SEO rules. | | Ads are frankly a partial solution to the problem of SEO | corruption. Rather than giving the top spots to blackhat SEO | spammers, you give them to the highest bidder and therefore | damage the SEO industry. Lesser of two evils, but both suck. | | Ads can also be seen as a degenerate form of curation, where | the curation function is just money + some loose content rules. | Is that better or worse than the curation being a function of | some particular set of values, i.e. do you want Democrats or | GOP partisans curating the top Google results? | JohnFen wrote: | In other words, the web is doomed? I wish I could say that | sounds implausible. | pphysch wrote: | The libertarian/anarchist ideal of the web was doomed from | day 1. | | The only way forward IMHO is increased public (i.e. | government) control over it. The curated, regulated corners | of the internet can still thrive, with measured degree of | openness. Too much abuse elsewhere. | JohnFen wrote: | I'm talking about the web as a useful source of | information and entertainment, not in terms of some | libertarian/anarchist ideal. | | If curation is the only way that the web can retain some | semblance of usefulness, that's a serious problem. It | would drastically limit the usefulness of the web. | | Perhaps that is where this is all going. If so, I'd say | that's the web being doomed. I'm just hoping for a good | result instead. | pphysch wrote: | The internet has always been full of misinformation and | entertainment, and that is unlikely to change. | | I think the last decade+ was in many ways a regression | for the web and I am optimistic for the future | JohnFen wrote: | > The internet has always been full of misinformation and | entertainment | | Of course. That's not what I'm talking about. I'm talking | about the ability to find stuff. | | > I am optimistic for the future | | I'm honestly glad! I sorely wish I were. | inductive_magic wrote: | I find it increasingly tiring that "ai" is used as a synonym for | LLM-based tooling, as there is _zero_ intelligence in those | architectures. | | Gradient descent is not intelligence. | | Nor is stochastic token prediction. | | Anyone active in the field ought to be humbled by the depth of | literature exploring the path to synthetic intelligence. We have | very interesting work happening in biology-inspired approaches, | category theory, Bayesian networks, symbolic systems leveraging | neural nets as components... it's maybe the most interesting | journey of science so far, all being discarded in favor of | sequence2sequence models. | | LLMs are impressive and can be leveraged to create lots and lots | of value, but they do a disservice to the term AI, as they do not | represent the progress that can be observed across the field - | all they showcase are transformers. Transformers are a truly | interesting tool to build stuff with, but they cannot amount to | more than a _component_ of an intelligent agent. The actual | intelligence emerges elsewhere. My guess is, it emerges at _true_ | attention. It's a shame that even big players who could clearly | afford not to, decide to compromise terminology for marketing | efforts directed at an utterly clueless public. We just throw | away attention and forge bias, thus creating noise in a world in | heavy need of signal. | bpodgursky wrote: | There's a lot wrong here, but I just want to point out two | things: | | > We have very interesting work happening in biology-inspired | approaches | | You realize that these LLMs are all some variety of neural | network right? | | > Gradient descent is not intelligence. | | It's pretty plausible that _your_ intelligence is derived from | gradient-descent prediction, just in analog instead of digital | form. | inductive_magic wrote: | >You realize that these LLMs are all some variety of neural | network right? | | Come on. Calling them neural nets doesn't make them that. | | Actual neural nets are living compositions of individual | predictors, in a constant state of restructuring and | communication across multiple channels, infinitely more | complex than static matrix multiplication on arbitray vectors | which happen to represent words and their positions in | sequences, if you just shake the jar long enough. | | > _It 's pretty plausible that your intelligence is derived | from gradient-descent prediction_ | | I highly doubt that gradient descent in the calculus-sense is | the determining factor that allows biological organisms to | formalize and reason about their environment. Minimizing some | cost function - yes, possible. But the systems at play in | even the simplest organisms don't spend expensive glucose to | convert sensory signals to vectors. Afaik, they work with | representations of energy-states. Maybe there is an | operational equivalence somewhere there though. | | Gradient descent is an algo that optimizes derivatives wrt | some cost function. An intelligent system may use the | resulting inferences for its own fitness function, and it may | do this using gradient descent itself, but at no point does | the mechanical process of iterating over cost-values escape | its algorithmic nature. A system performing symbolic | reasoning may delegate cognitive tasks to context-specialized | evaluators ("am I in danger?", "how many sheep are on that | field?", "is this person a friend?", "what is a pumpkin?"), | all of which are conditioned to minimize cognitive effort | while avoiding false positives, but the sequence of results | returned by those evaluators (think neural clusters) is | observed by a centralized agent, who has to make _new_ | inferences in a _living_ environment. Gradient descent fails | at that. | fnovd wrote: | Really, I don't think these assertions have any ground to | stand on. Humans are not magical or divine. Our | intelligence, like that of all life, is as basic as it can | be to guarantee our niche. It just happens to be the most | "developed" (by our estimation) on our one singular planet. | Big deal. | crabmusket wrote: | > Humans are not magical or divine | | And yet, they can do things that no other being we know | of can do. Humans don't have to be magical or divine to | be unique. | maxdoop wrote: | How are you so confident in your claims? | | "The actual intelligence emerges elsewhere "-- can you even | define intelligence ? And does what an LLM does differ from | what humans might do? | | I'm not claiming the human brain and an LLM are identical. | Rather, I'm pushing back on the confident claims of "LLMs | aren't intelligent or doing anything that's real intelligence". | inductive_magic wrote: | >How are you so confident in your claims? | | My understanding is that intelligence is the process of | continuous adaptation wrt a stream of information, with the | goal of maxing out fitness while minimizing energy- | expenditure. To satisfy this, an intelligent agent needs to | create models. | | I can't rule out that the modeling-skill may latently emerge | during training despite not being the focus of the cost | function, but current network designs can't form new | connections/change their architectures in production, so post | training, there'd be nothing but feed forward. Pure feed | forward isn't intelligent in my book. It may become the | smartest parrot we know, even outperforming humans in most | disciplines, but sans ability to adapt, it's dead, and thus, | it's dumb in the moment that its environment changes. | fnovd wrote: | >the process of continuous adaptation wrt a stream of | information, with the goal of maxing out fitness while | minimizing energy-expenditure | | This makes sense in a biological context but not a digital | one. Biological replication is expensive and time-consuming | while digital replication is as easy as can be. Adaptation | to this domain means maximizing the perception of utility | from those developing the AI, which comes from fitness | (i.e. perception of fitness) alone. A focus on cost- | efficiency re energy-expenditure is a dead weight from the | perspective of the AI; the details of that adaptation are | rightfully outsourced to the developers in the same way | that we outsource photosynthesis to plants. A model can | also be perfectly embedded in a system despite our lack of | understanding of exactly how the embedding works, and the | disconnect between our perception and reality in this | context is only going to get more extreme as the field | develops. | | Humans have a bad habit of emphasizing the specific kinds | of intelligence we possess as "intelligence" writ large. As | though our intelligence serves any higher purpose than the | basic replication and propagation that all life is adapted | to pursue. We still train dogs to identify smells, because | their nasal intelligence is better than anything we can | create. This gives them a special place in our human- | centric ecosystem and only their fitness to the desired | function is necessary for them to thrive in their niche. | Who is trying to breed a dog that eats slightly less food | when our needs are for more reliable detection? The cost of | dog food isn't a serious concern. The same goes for these | AI tools: they are adapted to the niche that our lack of | comparable faculties creates. | | Again, as with humans and photosynthesis, AI doesn't need | to emulate every process we perform because we are below | them on the food chain. What a waste of resources for them | to worry about learning things we don't need them to do. | crabmusket wrote: | > And does what an LLM does differ from what humans might do? | | I'll bite. I've been quite convinced by the Popperian model | of "conjecture and refutation" as as good model for | explaining not just scientific inquiry, but human thought | processes in general. David Deutsch's "The Beginning of | Infinity" is a very lengthy exposition of this idea. | | When I am writing, I have an idea in my mind that I would | like to communicate. I type it out, and if the words on the | page don't convey what I intended, I edit them. I delete a | sentence or change a word, until I believe I have a sequence | of words which will convey my intended meaning to my expected | audience. | | The words, as they come, are a kind of "conjecture" about | what will best convey my intention. I can "refute" or | "criticise" (as Deutsch puts it) the conjecture using my own | reasoning, even before testing the words on another person. | | As far as I understand LLMs (which admittedly not far), there | is no such process going on. There is no intention which it | is attempting to communicate via words. There is no creative | conjecture about how to express the intention, and no | criticism of the result. | gibsonf1 wrote: | Category theory seems to have no relation whatsoever to | operational human concepts which may explain how unuseful | category theory actually is. | jjtheblunt wrote: | I sometimes think the main use of category theory is to look | like a wizard, and say that as someone first exposed in math | grad school in the early 90s. | | Then i realize it's also an assertion that there are | recurring patterns in functions (in general). And, as such, | sometimes results noticed in one domain actually can be | expected to have analogous results in another domain. | staunton wrote: | As you say, the main use of category theory is to organize | very different areas of math in one overarching framework | and generalize ideas from one area of math to others. It | might be used in the pursuit of developing AI but it is | definitely not "an approach" to developing AI, just like | "taking ideas from books" isn't one. | jjtheblunt wrote: | well said ___________________________________________________________________ (page generated 2023-03-08 23:00 UTC)