[HN Gopher] Show HN: Phind.com - Generative AI search engine for...
       ___________________________________________________________________
        
       Show HN: Phind.com - Generative AI search engine for developers
        
       Hi HN,  Today we're launching phind.com, a developer-focused search
       engine that uses generative AI to browse the web and answer
       technical questions, complete with code examples and detailed
       explanations. It's version 1.0 of what was previously known as
       Hello (beta.sayhello.so) and has been completely reworked to be
       more accurate and reliable.  Because it's connected to the
       internet, Phind is always up-to-date and has access to docs,
       issues, and bugs that ChatGPT hasn't seen. Like ChatGPT, you can
       ask followup questions. Phind is smart enough to perform a new
       search and join it with the existing conversation context. We're
       merging the best of ChatGPT with the best of Google.  You're
       probably wondering how it's different from the new Bing. For one,
       we don't dumb down a user's query the way that the new Bing does.
       We feed your question into the model exactly as it was asked, and
       are laser-focused on providing developers the most detailed and
       comprehensive explanations to code-related questions. Secondly,
       we've focused the model on providing answers instead of chatbot
       small talk. This is one of the major improvements we've made since
       exiting beta.  Phind has the creative abilities to generate code,
       write essays, and even compose some poems/raps but isn't interested
       in having a conversation for conversation's sake. It should refuse
       to state its own opinion and rather provide a comprehensive summary
       of what it found online. When it isn't sure, it's designed to say
       so. It's not perfect yet, and misinterprets answers ~5% of the
       time. An example of Phind's adversarial question answering ability
       is https://phind.com/search?q=why+is+replacing+NaCL+with+NaCN+i....
       ChatGPT became useful by learning to generate answers it thinks
       humans will find helpful, via a technique called Reinforcement
       Learning from Human Feedback (RLHF). In RLHF, a model generates
       multiple candidate answers for a given question and a human rates
       which one is better. The comparison data is then fed back into the
       model through an algorithm such as PPO. To improve answer quality,
       we're deploying RLAIF -- an improvement over RLHF where the AI
       itself generates comparison data instead of humans. Generative LLMs
       have already reached the point where they can review the quality of
       their own answers as good or better than an average human rater
       tasked with annotating data for RLHF.  We still have a long way to
       go, but Phind is state-of-the-art at answering complex technical
       questions and writing intricate guides all while citing its
       sources. We'd love to hear your feedback.  Examples:
       https://phind.com/search?q=How+to+set+up+a+CI%2FCD+pipeline+...
       https://phind.com/search?q=how+to+debug+pthread+race+conditi...
       https://phind.com/search?q=example+of+a+c%2B%2B+semaphore
       https://phind.com/search?q=What+is+the+best+way+to+deploy+a+...
       https://phind.com/search?q=show+me+when+to+use+defaultdicts+...
       Discord: https://discord.gg/qHj8pwYCNg
        
       Author : rushingcreek
       Score  : 127 points
       Date   : 2023-02-21 17:56 UTC (5 hours ago)
        
 (HTM) web link (phind.com)
 (TXT) w3m dump (phind.com)
        
       | joenot443 wrote:
       | Excellent results for the first two queries I tried, one about
       | HStack in SwiftUI, another about clamp in GLSL, and a bit of a
       | mixed bag for what I purposely worded as a more error-prone and
       | beginner minded query: "how do i create a second window in
       | openframeworks?"
       | 
       | Absolutely fantastic stuff, I'm excited to add this to my tool-
       | belt. There's a specific feeling of knowing that an answer to
       | your question is very simple and exists somewhere on SO, but the
       | mental effort of sifting pages of answers seems unappealing. It
       | seems like Phind is well suited to do this job for you!
        
         | rushingcreek wrote:
         | Thanks for the feedback! I'm happy it worked well for you.
         | We're working on improving consistency -- one thing to try is
         | simply refreshing the page to get a new answer.
        
       | raajg wrote:
       | The hints for followup questions is an interesting feature and
       | something that could become a USP for this search engine. The
       | followup questions were at times exactly what I wanted to ask
       | next and sometimes thought-provoking and I was compelled to click
       | and ask them.
       | 
       | The performance could be improved. I've having to wait several
       | seconds before the summary is created.
        
         | raajg wrote:
         | Example of a search query with good followup questions:
         | 
         | https://phind.com/search?q=why+is+funcref+not+working+in+God...
        
         | rushingcreek wrote:
         | Thanks! We're pretty excited about that feature. As for
         | latency, it's normally better. We're feeling the HN traffic
         | crunch at the moment and are working on scaling.
        
       | whiplash451 wrote:
       | When asked << what is the best time to go skiing? >>, Phind
       | fixates on Colorado for some reason, then proceeds to delivering
       | a huge blurb about skiing in Colorado ending with << in the end,
       | the best time to go skiing in Colorado is a matter of personal
       | preference >>. Well, that was useful.
        
         | RileyJames wrote:
         | On the one hand, you're right. That's not a useful answer.
         | 
         | On the other hand, they did state it's a developer focused
         | search product for technical & factual questions. They're
         | aiming to make a bot that doesn't provide opinions, or long
         | convoluted conversations. On that basis your query isn't a
         | great representation of that.
         | 
         | But what should the answer be anyway? Winter? When there's
         | snow? When you have a break from work and enough money for a
         | lift pass? When you're feeling strong and healthy? Should the
         | bot ask you clarifying questions to determine that?
         | 
         | If you asked a person this question they'd either ask you
         | clarifying questions regarding what you actually want to know,
         | or give you a vague answer based on where and when they like to
         | go skiing.
        
         | raajg wrote:
         | Well, developers can ski of course. but I think the search
         | engine is focussed on software dev
        
         | rushingcreek wrote:
         | Thanks for the feedback. We'll work on improving.
         | 
         | Running this question again, I got:
         | 
         | > For example, ski resorts at Lake Tahoe usually open after
         | Thanksgiving and close in late April, with February offering
         | the best skiing conditions. In Colorado, the ski season
         | generally runs from mid-November through mid-April, with
         | February being the best time to ski due to the deepest base
         | depth of snow on the mountains and plenty of powder still
         | pouring in. However, early snowfall can bring early openings,
         | and the weather can be unpredictable. In Park City, Utah, the
         | best time to ski is from December to March, with January being
         | the busiest month due to the Sundance Film Festival.
        
       | ninjaa wrote:
       | This is really great. What model are you guys using? Do share
       | your process if you're so inclined.
       | 
       | Great work
        
         | rushingcreek wrote:
         | We're using a combination of our own models and OpenAI models.
         | For our own models, we've found success with Flan-T5 and UL2
         | which we've further trained on our own data.
        
       | [deleted]
        
       | SillyUsername wrote:
       | Hmmm I'm sure the tech is modern but the name... evokes thoughts
       | of throwback to 2000s "ph"at beats, gr"ind"r, and trying to be
       | trendy like appending "ly" to a noun for the company name (
       | https://thenextweb.com/news/whats-startups-name-trend-misspe...
       | ).
       | 
       | Maybe it's just me, or maybe that what was being aimed for?
        
         | rushingcreek wrote:
         | We liked the name "Phind" because it was playful and cheeky.
         | And it is a bit of a throwback style-wise.
        
         | [deleted]
        
       | hintymad wrote:
       | It looks phind queues a query for its machine learning models. I
       | submit the following query twice. For the first time, phind gave
       | Google-like answers that talked about only Guava. For the second
       | time, though, Phind gave me good answers on using popular Go
       | libraries with sample code.
       | 
       | https://phind.com/search?q=How+do+I+use+a+cache+that+is+like...
        
         | rushingcreek wrote:
         | Both answers were run through the LLM. Answer variability is
         | caused by different web links being returned and the way we
         | sample answers from the LLM. We're working on making it more
         | consistent.
        
           | dhc02 wrote:
           | Consistency has to be a tough problem to solve for a service
           | like this, since randomness in the choice of each token is
           | part of the magic sauce that makes LLMs work.
        
             | rushingcreek wrote:
             | It is definitely a hard problem. There are ways to ensure
             | consistency, such as using beam search decoding, which is
             | deterministic. But that comes with other tradeoffs
             | regarding answer quality.
        
       | najarvg wrote:
       | Very well done. Did a couple of "How to" questions to generate R
       | scripts for basic purposes (e.g. scraping a webpage) and
       | responses were not only descriptive but also covered other
       | nuances like captcha, javascript execution blocking,
       | parallelizing slow responding sites etc. Have not tried with more
       | nuanced advanced questions but looks very promising so far!
        
         | djhn wrote:
         | Out of curiosity, what suggestions did you get for captchas and
         | slow sites?
        
       | nathias wrote:
       | > Warning iconYour browser is out of date! Update your browser to
       | view this website correctly. More Information.
       | 
       | I hate this so much, I have firefox nightly 107, how is everyone
       | flagging me as out of date ... is it a bad library that you are
       | all using?
        
       | freediver wrote:
       | Congrats on the launch! Can you share more information about the
       | intended business model?
        
         | rushingcreek wrote:
         | Thanks, Vlad. There will always be a free version of Phind. We
         | are thinking about either ads, a ChatGPT-style subscription
         | model, or a combination of the two.
        
       | swyx wrote:
       | the latency is really killer for this kind of usecase.. any plans
       | to figure out how to cut it down?
        
         | rushingcreek wrote:
         | the answer should start generating almost immediately like with
         | ChatGPT. right now our infrastructure is being hugged a bit,
         | and we are scaling it up.
        
       | maxgomez7 wrote:
       | Great stuff! Very helpful
        
       | [deleted]
        
       | Szpadel wrote:
       | nice, looks great, for few questions I tried it sometimes drifted
       | into unrelated details after responding original question. But at
       | that point I'm already satisfied and maybe giving some additional
       | info isn't bad either. great product, for sure better than
       | digging through all copycats of stack overflow and GitHub in
       | Google results
        
         | rushingcreek wrote:
         | Yeah, we're trying to strike a balance between being concise
         | and completely answering complex questions. Answers are
         | deliberately long at the moment as we would rather completely
         | answer every question than optimize prematurely by leaving out
         | important details on occasion.
         | 
         | We're thinking about ways to enhance readability by breaking
         | the core of the answer into its own paragraph.
        
       | thefourthchime wrote:
       | My test for any search engine is:
       | 
       | "Best California style burrito in Austin".
       | 
       | Nearly every engine shows me burrito shops in California, some
       | give shove reddit links to the top. Google was the only decent
       | response. Phind response is what I would expect from an assistant
       | who researched this for 5-10 minutes of searching the web. Great
       | work!
       | 
       | (now add maps to those results!)
        
         | rushingcreek wrote:
         | Location-aware search is one area where it's very tricky to
         | compete with Google. Google Maps/reviews is a phenomenal
         | product. Happy to hear that Phind worked for you here, but
         | we're more focused on the developer/technical search use case
         | for now.
        
           | anothernewdude wrote:
           | No, for that search what you need to understand is "in
           | <location>" vs "<location> style". Which you could get with
           | supporting n-grams, frankly.
        
       | meghan_rain wrote:
       | Is it using the Bing Search API under the hood?
        
         | rushingcreek wrote:
         | Yep, we do use the Bing API.
        
           | throwaway280382 wrote:
           | Can someone here explain how a solo developer can use bing
           | API. It seems the cost is not cheap, even for basic plan.
           | "1,000 transactions free per month for all markets"
           | 
           | https://www.microsoft.com/en-us/bing/apis/pricing
           | 
           | Thats like 30-searches per day (including any tweaking). How
           | can a solo developer make an MVP based on this?
        
             | [deleted]
        
           | elashri wrote:
           | Then why it is saying "Was this answer better than Google?"
        
             | rushingcreek wrote:
             | Google is still what most people use and our biggest hurdle
             | is getting people to consider using something else. So
             | that's what we compare ourselves to.
             | 
             | Interestingly, the very existence of the new Bing does us a
             | favor in this regard -- it warms people up to the idea of
             | using something other than Google, even if it's just for a
             | subset of searches.
        
           | meghan_rain wrote:
           | So this project can be summarized in a single HTTP request
           | that takes a prompt, puts some bing results in it, and shows
           | it to the user?
        
             | rushingcreek wrote:
             | No, we read the websites returned by the web results, feed
             | that into our large language models, and generate an answer
             | based on that.
        
               | meghan_rain wrote:
               | So this is not based on OpenAI? You have your own
               | language models?
        
               | rushingcreek wrote:
               | We use a combination of our own language models + OpenAI.
        
         | tuukkah wrote:
         | Or DDG?
        
           | [deleted]
        
           | elashri wrote:
           | DDG is using Bing API itself
        
             | tuukkah wrote:
             | Good point. With that in mind, e.g. WebChatGPT using DDG
             | may be a matter of working around the Bing API costs.
        
       | swyx wrote:
       | https://phind.com/search?q=what+is+the+architecture+of+react...
       | 
       | this currently returns a wall of text - 545 words for first
       | paragraph. any way to chunk it up? or get a bullet point version?
        
         | rushingcreek wrote:
         | You can say "give me bullet points" as a followup question. But
         | we are working on avoiding generating more readable "chunks" of
         | text as opposed to large paragraphs.
        
       | wenbin wrote:
       | it works well !
       | 
       | https://phind.com/search?q=show+ruby+code+snippet+to+search+...
        
         | rushingcreek wrote:
         | Thanks! Anything we can do better?
        
       | Szpadel wrote:
       | when I type in follow up question and them click in autocompleted
       | question it uses it as main question instead of follow up
        
         | rushingcreek wrote:
         | ah yes that's a bug that will be fixed shortly.
        
           | Szpadel wrote:
           | thanks one thing that I could suggest is that on mobile when
           | links are visible, submitting follow up question leaves
           | visible screen with links (response is places below them)
        
             | rushingcreek wrote:
             | yep, we're working on automatically scrolling you down to
             | the followup as well.
        
       | kxrm wrote:
       | For basics this works very well.
       | 
       | https://phind.com/search?q=How+should+I+filter+a+dictionary+...
       | 
       | https://phind.com/search?q=How+would+I+use+php+to+encode+wit...
       | 
       | https://phind.com/search?q=How+do+I+get+my+data+from+a+datab...
       | 
       | For more complicated prompts it misses the mark a bit but this
       | may be going outside of the intended use case.
       | 
       | https://phind.com/search?q=Create+a+python+class+that+can+pa...
       | 
       | It would be nice if it annotated the projects it suggested in the
       | response. The above query talks about a project called "mov" in
       | the "provided code" but I do not see any code provided.
        
         | rushingcreek wrote:
         | Thanks for the feedback. The "Create a python class that can
         | parse MP4 headers" question is something that Phind should be
         | able to answer well. If it doesn't give an example immediately,
         | following up with "give me an example" usually works well.
         | 
         | Running it again, I got a code example:
         | 
         | > Create a python class that can parse MP4 headers
         | 
         | > To create a Python class that can parse MP4 headers, one can
         | use the pymp4 library available on GitHub.
         | 
         | The Box class from this library can be used to build and parse
         | MP4 headers.
         | 
         | The following code shows an example of using the Box class to
         | build an MP4 header and then parse it:
         | 
         | from pymp4.parser import Box from io import BytesIO
         | 
         | header = Box.build(dict( type=b"ftyp", major_brand="iso5",
         | minor_version=1, compatible_brands=["iso5", "avc1"]))
         | print(header)
         | 
         | parsed_header = Box.parse(header) print(parsed_header)
        
           | kxrm wrote:
           | Interesting, when I went back and did the prompt again it
           | worked, I wonder if this is a context problem, because I just
           | asked multiple questions in the same session on different
           | programming topics.
           | 
           | That question wasn't my first question.
           | 
           | For clarity I asked different questions in the top search box
           | assuming that would kick off a new session.
        
             | rushingcreek wrote:
             | Gotcha. Asking a question in the top box always clears the
             | context and starts a new session.
        
       | y-curious wrote:
       | Looks like it's hugged to death. It has good prompt injection
       | attack blocking :)
        
       | nomdep wrote:
       | This is fantastic! Thank you
        
       | whiplash451 wrote:
       | The first question I asked did not end well : << why is computer
       | vision so hard? >>
       | 
       | Phind gets the meaning of computer vision completely wrong.
       | 
       | Google got the meaning right, but answers with a bunch of links
       | of dubious quality.
        
         | rushingcreek wrote:
         | Would you mind sharing what exactly it got wrong and how we can
         | do better?
        
       | Szpadel wrote:
       | Does that voting "better than Google" is used for further model
       | learning? if so, is there any protection for manipulation? I
       | could imagine that someone could use that to convince model to
       | promote one product over another
        
         | rushingcreek wrote:
         | It is used for improvement, but there's a filtering step built
         | in to help prevent abuse.
        
       ___________________________________________________________________
       (page generated 2023-02-21 23:00 UTC)