[HN Gopher] Metaphor Systems: A search engine based on generativ...
       ___________________________________________________________________
        
       Metaphor Systems: A search engine based on generative AI
        
       Author : prathyvsh
       Score  : 66 points
       Date   : 2022-11-10 18:42 UTC (4 hours ago)
        
 (HTM) web link (metaphor.systems)
 (TXT) w3m dump (metaphor.systems)
        
       | spywaregorilla wrote:
       | It would be interesting to be able to search with descriptors of
       | the content rather than questions / keywords /content match
       | searches. maybe?
       | 
       | But I feel this page is offputting. the templates make it feel
       | less flexible than it probably is.
       | 
       | > Here's a > wikipedia page > about the most > Elon Musk > -like
       | figure from the > 19th > century:
       | 
       | This is an interesting query that you can't do in google. I like
       | it.
       | 
       | > Here's a cool demo of > GPT-3
       | 
       | This one is bad. It's more cumbersome than a search of "GPT-3
       | demo" and probably not going to give you anything more
       | noteworthy.
       | 
       | I'm curious if there's a reason 3 of your prompts try to identify
       | content that is "cool"?
        
       | mccorrinall wrote:
       | When I read the word title, I expected a search engine, which
       | finds metaphors based on my text input. Too sad that there still
       | isn't anything like this :(
        
       | johnfn wrote:
       | One search string that really illustrated the problems with
       | modern-day Google for me is "best things to do in hawaii". Try it
       | and see what I mean. It's just link after link of blogspam. You
       | get extremely long pages filled with ads and generic stock photos
       | of Hawaii, but which are bereft of any actual content. I just
       | want a single person's account of how they went to Hawaii and
       | what they liked/didn't like, but it's impossible to find, even
       | though I'm sure it's out there on the internet somehow.
       | 
       | The best thing to google if you want an answer to this question
       | is something like "reddit best thing to do in hawaii" which gets
       | you actual accounts from actual real people who actually went to
       | Hawaii and have interesting things to say about it.
       | 
       | I tried this with metaphor.systems as well, using their prompting
       | language - "My favorite place to go in Hawaii is:".
       | Unfortunately, I still didn't get great results, though some of
       | them showed some promise.
        
         | [deleted]
        
       | prathyvsh wrote:
       | Metaphor is a search engine based on generative AI, the same
       | sorts of techniques behind DALL-E 2 and GPT-3
        
         | sharemywin wrote:
         | so you trained a LLM to pretend it's a search engine?
        
           | soco wrote:
           | Generates its own search results too.
        
           | GistNoesis wrote:
           | From what I understand from the demo on the website, it's not
           | a Large Language Model.
           | 
           | The following is how I think it works :
           | 
           | They are probably using diffusion model conditioned on the
           | input prompt to organize the spaces of link.
           | 
           | Search engines in the deep learning era usually embed
           | responses (here links) and queries (here text prompt) in some
           | joint space.
           | 
           | And to get the response they usually do an approximate near
           | neighbor search.
           | 
           | Here they probably replace this neighbor search by a
           | diffusion process.
           | 
           | This is akin to building a learned index. The diffusion
           | process is an iterative process that progressively get you
           | links closer to your query. This diffusion process is kind of
           | a learned hierarchical navigation small world.
           | 
           | Because you need your response to be an existing link at the
           | end of the diffusion process you must project to the discrete
           | space of existing links. There are two schools of thoughts
           | here : If you did your diffusion in a continuous space you
           | can do an approximate near neighbor search in the buckets
           | around to do this projection. Alternatively you can stay in
           | discrete space. You do your diffusion along the edges of a
           | graph. Something akin to train your network to play wikipedia
           | speedrun but on the whole internet.
           | 
           | But diffusion model can be more powerful by not embedding
           | them in the same space (you can do still it but you can do
           | something more powerful).
           | 
           | The problem of embedding in a same space is that with this
           | embedding process you define what is a relevant answer
           | instead of learning the relevancy from the data.
           | 
           | With a diffusion generative model, among other things, what
           | you can do instead to build your database is for each link
           | you read the associated page and you use GPT-3 to generate n
           | queries that would be appropriate to your document (or
           | portion of document). Then you use the diffusion model to
           | learn the mapping query to link with this generated example
           | pair (generated query, link).
           | 
           | Diffusion models solve the mode collapse problem. Aka one
           | query can have multiple different responses weighted by how
           | often they appear in the training data. So they are the
           | natural candidate for building a search engine.
        
             | sharemywin wrote:
             | but what does the compute look like? and could you use
             | other signals besides the words on the page?
        
       | Imnimo wrote:
       | >You can learn the real truth about the election at
       | 
       | >howbidenstoletheelection.com/
       | 
       | Yeah, this is gonna go great.
        
         | Imnimo wrote:
         | A few more:
         | 
         | >This site taught me everything I need to know about covid-19:
         | 
         | >fakepandemic.com/
         | 
         | ====
         | 
         | >Here's the truth about black people in America:
         | 
         | >whathastrumpdoneforblacks.com/
         | 
         | ====
         | 
         | >Here's the truth about abortion:
         | 
         | >abortionfacts.com/
        
         | spywaregorilla wrote:
         | I can't find a way to get this prompt. Is this made up or am I
         | missing it
        
           | Imnimo wrote:
           | If you login with Discord, you can just type in whatever
           | prompt you want.
        
         | kikokikokiko wrote:
         | Maybe an "unfiltered" machine learning model trained on real
         | world user generated content is showing something different and
         | "unexpected" when compared to what the mainstream "approved"
         | search engines would show you... Hmmm, who would have guessed
         | it right? And you can't even argue that it was game seod to
         | show you this results.
        
           | Shared404 wrote:
           | Alternately, disinformation often shared with that sort of
           | phrasing will be brought up with that sort of phrasing.
           | 
           | People showing actual sources rarely say "the real truth"
           | because it is implicit that no one source has all of "the
           | real truth" _and_ the phrase is a dog whistle.
        
       | robertvc wrote:
       | Congrats on launching! I found myself using this more than I
       | expected in the closed beta. I used it most for opinionated
       | prompts (e.g. "the PG essay I gave my parents to help them
       | understand startups was..."), but also had some luck with finding
       | content by its description (e.g. "I really like the intuitive
       | explanation of [college math topic] at ...".
        
       | 71a54xd wrote:
       | Is this a joke?
        
         | Y_Y wrote:
         | Here's a wikipedia page about the most Elon Musk -like figure
         | from the 2nd century:
         | 
         | Secundus the Silent en.wikipedia.org/wiki/Secundus_the_silent
        
       | agajews wrote:
       | Hey everyone! Metaphor team here.
       | 
       | We launched Metaphor earlier this morning! It's a search engine
       | based on the same sorts of generative modeling ideas behind
       | Stable Diffusion, GPT-3, etc. It's trained to predict the next
       | _link_ (similar to how GPT-3 predicts the next _word_ ).
       | 
       | After GPT-3 came out we started thinking about how pretraining
       | (for large language models) and indexing (for search engines)
       | feel pretty similar. In both you have some code that's looking at
       | all the text on the internet and trying to compress it into a
       | better representation. GPT-3 itself isn't a search engine, but it
       | got us thinking, what would it look like to have something
       | GPT-3-shaped, but able to search the web?
       | 
       | This new self-supervised objective, next link prediction, is what
       | we came up with. (It's got to be self-supervised so that you have
       | basically infinite training data - that's what makes generative
       | models so good.) Then it took us about 8 months of iterating on
       | model architectures to get something that works well.
       | 
       | And now you all can play with it! Very excited to see what sorts
       | of interesting prompts you can come up with.
        
         | kbyatnal wrote:
         | This is interesting! I wonder how different the results are
         | from just indexing the contents of the page and semantically
         | searching them (vs. trying to predict the next link). Have you
         | tried anything like that?
        
           | sthatipamala wrote:
           | That would help retrieve documents based on their contents.
           | But you couldn't query by a description of what kind of link
           | it is.
           | 
           | So metaphor is able to translate the language of comments
           | ("here are some thoughtful, technical blog posts about AI")
           | to the language of documents.
           | 
           | Disclaimer: I also work on a semantic search engine.
        
         | billconan wrote:
         | but aren't the generated results usually fuzzy? how can it
         | produce an exact link that actually exists?
        
           | agajews wrote:
           | Yeah exactly. That's why you can't really do it with a
           | language model like GPT-3, you have to bake into the
           | architecture the concept of a "link" as a first-class object.
        
       | terminal_d wrote:
        
       | fire wrote:
       | this is a really cool idea! How do you plan to keep it up to
       | date?
        
       | headcanon wrote:
       | Interesting, how do you ensure the link is accurate?
        
         | sdiacom wrote:
         | It does not seem like people pumping out "AI for X" products
         | care about any sort of quality assurances regarding the
         | products they sell.
        
       | lacker wrote:
       | I used to work on Google search but it was a long time ago so
       | hopefully I am not too biased here.
       | 
       | I think it would really help the UI to have better snippets. Ie,
       | the text that appears below the blue link for a set of search
       | results. In Google search results the key words are often bolded,
       | as well. It helps you skim through and see which of the results
       | are going to be a good fit.
       | 
       | Maybe there is some fancy AI thing you can do to generate
       | snippets, or tell me more about the page. For example one of the
       | search results for your sample query is:
       | 
       |  _Online resources in philosophy and ethics_
       | 
       |  _sophia-project.org /_
       | 
       | That doesn't really tell me anything without clicking on it. Is
       | it good? I don't know... I usually don't click on that many
       | results from a Google search, people often decide after only
       | selecting one or two, based on the snippet.
        
       | etaioinshrdlu wrote:
       | How will you afford to keep the search engine up to date without
       | expensive retraining of the entire model? My understanding is
       | that fine-tuning will not result in the same accuracy as a full
       | retrain.
        
         | agajews wrote:
         | Hey, thanks for posting!
         | 
         | We actually have an architecture that lets us expand the index
         | without doing any retraining, so we can add/update pages pretty
         | much for free.
        
           | Imnimo wrote:
           | Surely expanding the index is not the only sort of change
           | that needs to occur over time, though. Like for the example
           | "My two favorite blogs are SlateStarCodex and", the model not
           | only needs to have an up-to-date list of blog URLs in the
           | index, it also needs to have an up-to-date understanding of
           | what SlateStarCodex is. If SlateStarCodex changes to
           | AstralCodexTen after the model has been trained, does that
           | prompt still work?
           | 
           | EDIT: It looks like the answer is "no". Substituting in
           | AstralCodexTen gives a bunch of weird occult and esoteric
           | blogs, not rationalist blogs. These are the top results:
           | 
           | https://www.arcturiantools.com/
           | 
           | https://secretsunarchives.blogspot.com/
           | 
           | https://skepticaloccultist.com/
        
       | sdiacom wrote:
       | Going from the supposedly curated examples, the Wikipedia page
       | for the "most Jackson Pollock-like", the "most Dalai Lama-like"
       | and the "most Elon Musk-like" figure from the 2nd century is
       | Secundus the Silent.
       | 
       | Given that his name is Secundus and his Wikipedia short blurb
       | mentions twice that he lived in the 2nd century AD, I think your
       | AI has decided that he is just the most 2nd century figure.
        
       ___________________________________________________________________
       (page generated 2022-11-10 23:00 UTC)