[HN Gopher] Metaphor Systems: A search engine based on generativ... ___________________________________________________________________ Metaphor Systems: A search engine based on generative AI Author : prathyvsh Score : 66 points Date : 2022-11-10 18:42 UTC (4 hours ago) (HTM) web link (metaphor.systems) (TXT) w3m dump (metaphor.systems) | spywaregorilla wrote: | It would be interesting to be able to search with descriptors of | the content rather than questions / keywords /content match | searches. maybe? | | But I feel this page is offputting. the templates make it feel | less flexible than it probably is. | | > Here's a > wikipedia page > about the most > Elon Musk > -like | figure from the > 19th > century: | | This is an interesting query that you can't do in google. I like | it. | | > Here's a cool demo of > GPT-3 | | This one is bad. It's more cumbersome than a search of "GPT-3 | demo" and probably not going to give you anything more | noteworthy. | | I'm curious if there's a reason 3 of your prompts try to identify | content that is "cool"? | mccorrinall wrote: | When I read the word title, I expected a search engine, which | finds metaphors based on my text input. Too sad that there still | isn't anything like this :( | johnfn wrote: | One search string that really illustrated the problems with | modern-day Google for me is "best things to do in hawaii". Try it | and see what I mean. It's just link after link of blogspam. You | get extremely long pages filled with ads and generic stock photos | of Hawaii, but which are bereft of any actual content. I just | want a single person's account of how they went to Hawaii and | what they liked/didn't like, but it's impossible to find, even | though I'm sure it's out there on the internet somehow. | | The best thing to google if you want an answer to this question | is something like "reddit best thing to do in hawaii" which gets | you actual accounts from actual real people who actually went to | Hawaii and have interesting things to say about it. | | I tried this with metaphor.systems as well, using their prompting | language - "My favorite place to go in Hawaii is:". | Unfortunately, I still didn't get great results, though some of | them showed some promise. | [deleted] | prathyvsh wrote: | Metaphor is a search engine based on generative AI, the same | sorts of techniques behind DALL-E 2 and GPT-3 | sharemywin wrote: | so you trained a LLM to pretend it's a search engine? | soco wrote: | Generates its own search results too. | GistNoesis wrote: | From what I understand from the demo on the website, it's not | a Large Language Model. | | The following is how I think it works : | | They are probably using diffusion model conditioned on the | input prompt to organize the spaces of link. | | Search engines in the deep learning era usually embed | responses (here links) and queries (here text prompt) in some | joint space. | | And to get the response they usually do an approximate near | neighbor search. | | Here they probably replace this neighbor search by a | diffusion process. | | This is akin to building a learned index. The diffusion | process is an iterative process that progressively get you | links closer to your query. This diffusion process is kind of | a learned hierarchical navigation small world. | | Because you need your response to be an existing link at the | end of the diffusion process you must project to the discrete | space of existing links. There are two schools of thoughts | here : If you did your diffusion in a continuous space you | can do an approximate near neighbor search in the buckets | around to do this projection. Alternatively you can stay in | discrete space. You do your diffusion along the edges of a | graph. Something akin to train your network to play wikipedia | speedrun but on the whole internet. | | But diffusion model can be more powerful by not embedding | them in the same space (you can do still it but you can do | something more powerful). | | The problem of embedding in a same space is that with this | embedding process you define what is a relevant answer | instead of learning the relevancy from the data. | | With a diffusion generative model, among other things, what | you can do instead to build your database is for each link | you read the associated page and you use GPT-3 to generate n | queries that would be appropriate to your document (or | portion of document). Then you use the diffusion model to | learn the mapping query to link with this generated example | pair (generated query, link). | | Diffusion models solve the mode collapse problem. Aka one | query can have multiple different responses weighted by how | often they appear in the training data. So they are the | natural candidate for building a search engine. | sharemywin wrote: | but what does the compute look like? and could you use | other signals besides the words on the page? | Imnimo wrote: | >You can learn the real truth about the election at | | >howbidenstoletheelection.com/ | | Yeah, this is gonna go great. | Imnimo wrote: | A few more: | | >This site taught me everything I need to know about covid-19: | | >fakepandemic.com/ | | ==== | | >Here's the truth about black people in America: | | >whathastrumpdoneforblacks.com/ | | ==== | | >Here's the truth about abortion: | | >abortionfacts.com/ | spywaregorilla wrote: | I can't find a way to get this prompt. Is this made up or am I | missing it | Imnimo wrote: | If you login with Discord, you can just type in whatever | prompt you want. | kikokikokiko wrote: | Maybe an "unfiltered" machine learning model trained on real | world user generated content is showing something different and | "unexpected" when compared to what the mainstream "approved" | search engines would show you... Hmmm, who would have guessed | it right? And you can't even argue that it was game seod to | show you this results. | Shared404 wrote: | Alternately, disinformation often shared with that sort of | phrasing will be brought up with that sort of phrasing. | | People showing actual sources rarely say "the real truth" | because it is implicit that no one source has all of "the | real truth" _and_ the phrase is a dog whistle. | robertvc wrote: | Congrats on launching! I found myself using this more than I | expected in the closed beta. I used it most for opinionated | prompts (e.g. "the PG essay I gave my parents to help them | understand startups was..."), but also had some luck with finding | content by its description (e.g. "I really like the intuitive | explanation of [college math topic] at ...". | 71a54xd wrote: | Is this a joke? | Y_Y wrote: | Here's a wikipedia page about the most Elon Musk -like figure | from the 2nd century: | | Secundus the Silent en.wikipedia.org/wiki/Secundus_the_silent | agajews wrote: | Hey everyone! Metaphor team here. | | We launched Metaphor earlier this morning! It's a search engine | based on the same sorts of generative modeling ideas behind | Stable Diffusion, GPT-3, etc. It's trained to predict the next | _link_ (similar to how GPT-3 predicts the next _word_ ). | | After GPT-3 came out we started thinking about how pretraining | (for large language models) and indexing (for search engines) | feel pretty similar. In both you have some code that's looking at | all the text on the internet and trying to compress it into a | better representation. GPT-3 itself isn't a search engine, but it | got us thinking, what would it look like to have something | GPT-3-shaped, but able to search the web? | | This new self-supervised objective, next link prediction, is what | we came up with. (It's got to be self-supervised so that you have | basically infinite training data - that's what makes generative | models so good.) Then it took us about 8 months of iterating on | model architectures to get something that works well. | | And now you all can play with it! Very excited to see what sorts | of interesting prompts you can come up with. | kbyatnal wrote: | This is interesting! I wonder how different the results are | from just indexing the contents of the page and semantically | searching them (vs. trying to predict the next link). Have you | tried anything like that? | sthatipamala wrote: | That would help retrieve documents based on their contents. | But you couldn't query by a description of what kind of link | it is. | | So metaphor is able to translate the language of comments | ("here are some thoughtful, technical blog posts about AI") | to the language of documents. | | Disclaimer: I also work on a semantic search engine. | billconan wrote: | but aren't the generated results usually fuzzy? how can it | produce an exact link that actually exists? | agajews wrote: | Yeah exactly. That's why you can't really do it with a | language model like GPT-3, you have to bake into the | architecture the concept of a "link" as a first-class object. | terminal_d wrote: | fire wrote: | this is a really cool idea! How do you plan to keep it up to | date? | headcanon wrote: | Interesting, how do you ensure the link is accurate? | sdiacom wrote: | It does not seem like people pumping out "AI for X" products | care about any sort of quality assurances regarding the | products they sell. | lacker wrote: | I used to work on Google search but it was a long time ago so | hopefully I am not too biased here. | | I think it would really help the UI to have better snippets. Ie, | the text that appears below the blue link for a set of search | results. In Google search results the key words are often bolded, | as well. It helps you skim through and see which of the results | are going to be a good fit. | | Maybe there is some fancy AI thing you can do to generate | snippets, or tell me more about the page. For example one of the | search results for your sample query is: | | _Online resources in philosophy and ethics_ | | _sophia-project.org /_ | | That doesn't really tell me anything without clicking on it. Is | it good? I don't know... I usually don't click on that many | results from a Google search, people often decide after only | selecting one or two, based on the snippet. | etaioinshrdlu wrote: | How will you afford to keep the search engine up to date without | expensive retraining of the entire model? My understanding is | that fine-tuning will not result in the same accuracy as a full | retrain. | agajews wrote: | Hey, thanks for posting! | | We actually have an architecture that lets us expand the index | without doing any retraining, so we can add/update pages pretty | much for free. | Imnimo wrote: | Surely expanding the index is not the only sort of change | that needs to occur over time, though. Like for the example | "My two favorite blogs are SlateStarCodex and", the model not | only needs to have an up-to-date list of blog URLs in the | index, it also needs to have an up-to-date understanding of | what SlateStarCodex is. If SlateStarCodex changes to | AstralCodexTen after the model has been trained, does that | prompt still work? | | EDIT: It looks like the answer is "no". Substituting in | AstralCodexTen gives a bunch of weird occult and esoteric | blogs, not rationalist blogs. These are the top results: | | https://www.arcturiantools.com/ | | https://secretsunarchives.blogspot.com/ | | https://skepticaloccultist.com/ | sdiacom wrote: | Going from the supposedly curated examples, the Wikipedia page | for the "most Jackson Pollock-like", the "most Dalai Lama-like" | and the "most Elon Musk-like" figure from the 2nd century is | Secundus the Silent. | | Given that his name is Secundus and his Wikipedia short blurb | mentions twice that he lived in the 2nd century AD, I think your | AI has decided that he is just the most 2nd century figure. ___________________________________________________________________ (page generated 2022-11-10 23:00 UTC)