[HN Gopher] Launch HN: Hello (YC S22) - A search engine for deve...
       ___________________________________________________________________
        
       Launch HN: Hello (YC S22) - A search engine for developers
        
       Hi HN, we're Michael and Justin from Hello Cognition
       (https://beta.sayhello.so). We're building a better search engine
       for software developers. Hello saves you time by synthesizing clear
       explanations to technical questions along with code snippets from
       the web, showing them right on the search page.  We've found that
       most technical searches fall into a few categories: ad-hoc how-tos,
       understanding an API, recalling forgotten details, research, or
       troubleshooting. Google is too broad and shallow of a search tool
       to be good at this. Even after sifting through the deluge of
       spammy, irrelevant sites pumped full of SEO, you still have to
       manually find your answer through discussion boards or
       documentation. Their "featured snippet" approach works for simple
       factoid queries but quickly falls apart if a question requires
       reasoning about information across multiple webpages.  Our approach
       is narrow and deep -- to retrieve detailed information for topics
       relevant to developers. When you submit a query, we pull raw site
       data from Bing, rerank them, and extract understanding and code
       snippets with our proprietary large language models. We use seq-to-
       seq transformer models to generate a final explanation from all of
       this input.  For our honors theses at UT Austin, we researched
       prototypes of large generative language models that can answer
       complex questions by combining information from multiple sources.
       We found that GPT-3, GPT-Neo/J/X, and similar autoregressive
       language models that predict text from left to right are prone to
       "hallucinating" and generating text inconsistent with the "ground
       truth" document. Training a sequence-to-sequence language model (T5
       derivative) on our custom dataset designed for factual generation
       yielded much better results with less hallucination.  After
       creating this prototype, we started actively developing Hello with
       the idea that searching should be just like talking to a smart
       friend. We want to build an engine that explains complex topics
       clearly and concisely, and lets users ask follow-up questions using
       the context of their previous searches.  For example, when asked
       "what type of semaphore can function as a mutex?", Hello pulls in
       the raw text from all five search results linked on the search page
       to generate: "A binary semaphore can be used as a mutex. Mutexes
       and semaphores are two different types of synchronization
       mechanisms. A mutex is a lock that prevents two threads from
       accessing the same resource at the same time. A semaphore is used
       to signal that a resource has become available." We're biased, of
       course, but we think that the ability to reason abstractly about
       information from multiple web pages is a cool thing in a search
       engine!  We use BERT-based models to extract and rank code snippets
       if relevant to the query. Our search engine currently does well at
       answering applicable how-to questions such as "Sort a list of
       tuples by the second element", "Set a response cookie in FastAPI",
       "Get value of input in React", "How to implement Dijkstra's
       algorithm." Exclusively using our own models has also freed us from
       dependence on OpenAI.  Hello is and will always be free for
       individual devs. We haven't rolled out any paid plans yet, but
       we're planning to charge teams per user/month to use on internal
       data scattered around in wikis, documentation, slack, and emails.
       We started Hello Cognition to scratch our own itch, but now we hope
       to improve the state of information retrieval for the greater
       developer community. If you'd like to be part of our product
       feedback and iteration process, we'd love to have you--please
       contact us at founders@sayhello.so.  We're looking forward to
       hearing your ideas, feedback, comments, and what would be helpful
       for you when navigating technical problems!
        
       Author : wayy
       Score  : 132 points
       Date   : 2022-07-06 16:24 UTC (6 hours ago)
        
       | mrwnmonm wrote:
       | I love the idea <3
       | 
       | It may be a weird suggestion, but if the query to general topics
       | returns something like this https://unzip.dev/archive (check how
       | compact it is and delivers almost all you need to know about the
       | subject to get you going), it would be perfect.
        
         | mrwnmonm wrote:
         | And what about books, just links to the most important books on
         | the subject, without getting too philosophical about how to
         | determine the most important ones.
        
         | wayy wrote:
         | Thanks for the feedback - I agree that explanations aren't the
         | most appropriate for every kind of search. We're definitely
         | considering different forms of search inputs and outputs (text,
         | code, lists, etc.)
        
       | 666462 wrote:
        
       | _ank_it wrote:
       | No dark mode?
        
         | wayy wrote:
         | In the works :)
        
       | Minor49er wrote:
       | Pretty decent overall. Though some code that comes back is a
       | little dense and hard to read. This appears to be caused by how
       | the scraper/parser handles whitespace. For example, the query
       | "make a curl request in php" returns an example from the comments
       | section in the PHP docs where the entirety of the comment has
       | been returned as code, but the <br> tags have not been converted
       | to newlines.
       | 
       | It would be a good idea to preserve whitespace, or arguably
       | better, integrate optional syntax formatting
       | 
       | Overall, this search engine looks promising
        
         | rushingcreek wrote:
         | Thanks :) we'll take a look at better syntax formatting for the
         | code snippets
        
         | [deleted]
        
       | golergka wrote:
       | > center a div
       | 
       | #myDiv{ margin:0px auto; }
        
       | [deleted]
        
       | yrgulation wrote:
       | Specialised search engines are the future in my view. Google is
       | an ocean of data great for generalised search. But as soon as you
       | want to narrow down to a specific domain it's mostly noise - in
       | my personal experience.
        
       | skrtskrt wrote:
       | > we're planning to charge teams per user/month to use on
       | internal data scattered around in wikis, documentation, slack,
       | and emails.
       | 
       | Do you have any idea of how you're going to go the "enterprise
       | integration" route without hiring an army of implementation
       | consultants?
       | 
       | best of luck, I'm sure there are many teams on Confluence that
       | wish they had a functioning search without moving everything off
       | Confluence at once
        
       | ezekiel11 wrote:
       | still not as good as stackoverflow
        
         | wayy wrote:
         | We're still very early - of course it can't be as good. Could
         | you tell me what you were trying to do and how it didn't work
         | for you?
        
           | jamesmcintyre wrote:
           | Not sure what the original commenter was looking for but I
           | can give my thoughts:
           | 
           | - stackoverflow's UI actually serves well to provide a sort
           | of "ambient" information that rapidly indicates not just the
           | best answers, but the best most-recent answers. Oftentimes,
           | and especially in rapidly-evolving dev languages/frameworks,
           | what was the best answer a few months ago may no longer be
           | the best answer and the ability to rapidly scan the comments
           | that would indicate this is valuable. - in addition those
           | stackoverflow comments and links within them can point to
           | additional info that can save the dev time (potentially
           | pointing to the dev misidentifying the problem: "don't do
           | this, this is the real issue <link>).
           | 
           | I think with the traditional google->stackoverflow or
           | google->[some documentation site, forum, etc] user flow you
           | actually get layers of ambient cues as to relevance, recency
           | and quality that we've grown accustom to. Even if your
           | product ultimately serves better answers I'd worry that
           | lacking these cues would make a user like me feel as though
           | I'm blindly trusting an answer that seems to have come from
           | the ether (sort of like github copilot).
           | 
           | As low-hanging fruit maybe adding level-meters beside each
           | result that indicates these dimensions could help (like
           | npmjs.com does with npm pkg results in their ui).
           | 
           | I love the product idea and it looks like a strong start!
           | Good luck!
        
             | wayy wrote:
             | I agree that software documentation is constantly changing
             | and a mechanism for evaluating the "freshness" of an answer
             | could be very useful. Right now we provide an easy way to
             | find the source of a code snippet but source
             | attribution/evaluation is something we're actively looking
             | into, especially for the natural language answer. A
             | quantitative score could be interesting too. Thanks for the
             | feedback!
        
           | ezekiel11 wrote:
           | it just won't be as good as the refinement in searches i can
           | do with appending stackoverflow at the end of a google query
           | and github copilot already does what you are trying to do
        
       | samelawrence wrote:
       | The answer never loads if using Brave with Shields up, even after
       | allowing JS.
        
         | wayy wrote:
         | Just installed Brave and it seems to work even with Shields up.
         | The site is under load so maybe try again. Are any of the other
         | things (answer/explanation, code snippet, links) loading for
         | you?
        
       | winddude wrote:
       | cool! some of the code samples shown aren't always the most
       | relevant, but looks promising.
        
       | hubraumhugo wrote:
       | Glad to see better search tooling for programmers since it's an
       | essential task we do every day. How do you compare yourself to
       | you.com's specialized search engine for developers?
       | https://you.com/code
        
         | chiken wrote:
         | you.com is a lot more informationally dense, feels quicker to
         | find the right answer, and I rarely have to open a new tab
         | because of the (open side panel) button. Beyond programming, it
         | doesn't seem like I can use hello.so on a regular basis for
         | normal searches compared to Google and you.com.
        
         | rushingcreek wrote:
         | You.com is too similar to Google imo; we go significantly
         | further than they do in terms of synthesizing explanations.
         | Same goes for snippets; on You.com, you usually need to click
         | on a button (.e.g "Open Side Panel") to get a code snippet
         | which adds friction. Furthermore, they seem to be simply
         | showing the full Stack Overflow page; our approach is to find
         | and rank the most relevant code snippet while offering a "See
         | Reference" button to make it easy to go to the original page.
        
       | [deleted]
        
       | selectnull wrote:
       | Using latest Chrome gives "Uncaught (in promise) TypeError:
       | Failed to fetch" console error.
       | 
       | The same is with any input, even with predefined ones; the
       | progress bar gets to the end (slowly) and nothing happens.
       | 
       | Firefox works.
        
         | wayy wrote:
         | Thanks for letting us know. We're under load too so I'm not
         | surprised there are some hiccups. I just tested with Chrome
         | Version 103.0.5060.66 and the fetch works - will have to
         | investigate the latest Chrome version later.
        
       | TekMol wrote:
       | I see whole solutions copied from other websites displayed on
       | your site.
       | 
       | Is that legal?
       | 
       | Isn't there copyright on those?
        
         | throwaway675309 wrote:
         | Ding ding ding. This is the exact issue that a vocal minority
         | are whingeing over github copilot for. It's automatically
         | pasting results from websites without embedding the necessary
         | attribution - so if you copy entire functions from this search
         | engine (which may be coming from stack overflow for which
         | attribution is required), then you're guilty of the same thing.
         | 
         | So I only see one of two outcomes:
         | 
         | 1. Courts rule copilot is fair use in which case your search
         | engine becomes largely superfluous
         | 
         | 2. Courts rule copilot is infringement in which case all of
         | these types of applications cannot be used commercially
        
         | lancesells wrote:
         | I would hope it's not legal.
         | 
         | > Hello pulls in the raw text from all five search results
         | linked on the search page to generate...
         | 
         | Not to be negative but I think I'll stick to the sites and
         | people that made the results and not a middleman that intends
         | to charge for other people's work.
        
           | GrinningFool wrote:
           | I think that framing is missing some nuance. Seems more like
           | they would be charging for the process that goes into sifting
           | through those results and pulling out other people's work on
           | the user's behalf.
        
             | lancesells wrote:
             | Yeah, I can see there's a lot of work going into the
             | process but it's still using other people's efforts without
             | permission or compensation.
        
               | visarga wrote:
               | > using other people's efforts without permission or
               | compensation
               | 
               | Like every web search done on Google? That said, I think
               | attribution links should be displayed, license too if
               | available. Copilot should be doing the same to defuse
               | this discussion.
        
       | ianbutler wrote:
       | Hey so I built a search engine doing largely the same thing (and
       | also interviewed with YC during the W20 time) and ultimately we
       | pivoted away due to lack of interest from the developer teams we
       | were pitching, often the startups we were pitching didn't have
       | enough accumulated internal knowledge for the paid plan to be
       | useful. For the ones who did (at the like 200+ person 5+ yrs in
       | business mark) we still weren't seeing the problem being painful
       | enough where companies wanted to pay to solve it.
       | 
       | How do you see navigating this space when this can be considered
       | a nice to have versus a strict need?
        
         | wayy wrote:
         | Right now we're primarily focused on building a search tool
         | that developers love. Would love to chat more about your
         | experience - shoot us an email at founders@sayhello.so
        
       | lawl wrote:
       | Maybe I'm misunderstanding the intended scope of this engine, or
       | I just ran into a bad result page, but:
       | 
       | https://beta.sayhello.so/search?q=Java+aot+compile
       | 
       | Does not seem to mention graal anywhere. (It's just a random test
       | query that popped into my mind)
       | 
       | Asking a full question for a code snippet seems to work:
       | https://beta.sayhello.so/search?q=How+do+I+sort+a+map+in+Jav...
       | 
       | How do you deal with licensing for these snippets though. Is that
       | up to the user to verify?
        
         | rushingcreek wrote:
         | Because we're currently built on top of Bing's index, we're
         | somewhat dependent on the raw pages they provide. If those
         | pages don't mention graal, neither will our AI. Building out
         | our own index is something we're working on.
         | 
         | It is currently up to the user to verify licensing for the
         | snippets, but we try to make it easy (using the See Reference
         | button) to go to the original source.
        
       | langitbiru wrote:
       | Congrats for the launching.
       | 
       | Anyway, just want to comment the product niche. Vertical search
       | engine? Interesting. Will we see another vertical for a search
       | engine product?
        
       | danenania wrote:
       | Congrats on the launch! I love this idea. I've thought for a long
       | time that something like it should exist. Google results are
       | often lacking in this realm.
       | 
       | I've played around just a bit and clicked some of the preset
       | examples and like what I'm seeing so far. I bookmarked it and
       | will try it out more as I code over the next few days.
       | 
       | Main initial feedback: I'd _really_ like to see version /last-
       | updated-at info accompanying all results. One of the biggest
       | problems with Google for code stuff is finding outdated examples
       | and docs. Even better would be a dropdown that lets me see
       | results depending on the version of the language/framework/tools
       | I'm using.
        
         | wayy wrote:
         | Thanks for trying it out and good point - we'll look into
         | adding version info
        
       | lysecret wrote:
       | Hey so, I have been working on a specialized search engine (at
       | least you can think of it in this way). And for me a lot of the
       | gains came because we could structure and restrain the search
       | space in a meaningful way, such that we could build better
       | distance metrics than a pure text search could do.
       | 
       | I wonder what you think about that. Maybe one could submit a code
       | snippet, or mark something as an error, or ask for a refactor of
       | some code. But then again, this gets close to what copilot is
       | doing.
        
         | wayy wrote:
         | We are absolutely experimenting with non-text inputs (and
         | outputs as you can see). A big problem we see in mainstream
         | search engines is that syntax is not parsed well. Understanding
         | code as context for the query could be huge for developer
         | search.
        
       | hill613 wrote:
       | Good idea. Although it seems it's language querying is pretty
       | poor.
       | 
       | If I specifically list Python/Javascript the first couple results
       | are not even in that language, 3rd/4th are. And you have to click
       | link/see reference to even see the language
       | 
       | You would think if your language is included in the query it
       | should be heavily prioritised
        
         | rushingcreek wrote:
         | Thanks for the feedback. Could you post what queries you tried?
        
       | silentsea90 wrote:
       | Would this be useful as a vscode plugin perhaps? I would much
       | rather search this on an IDE imo but I am just one random dude.
        
         | wayy wrote:
         | We've actually had a bit of feedback related to IDE
         | integration. We're focusing on developing the search technology
         | itself right now but vscode plugin could be interesting to look
         | at in the future.
        
       | skilled wrote:
       | Not to be too critical but the results I got so far have been
       | subpar. Seeing a lot of hyperbole/clickbait articles.
       | 
       | Let's say I'm searching for front-end frameworks. Each article
       | has the word "best" in the title, yet doesn't link to resources
       | like State of JS, Stack Overflow Survey or other similar sites.
       | So, in this context "best" is subjective. I can't be bothered
       | with subjective results when I'm trying to find out what is
       | actually considered "best" or in this case popular.
        
         | rushingcreek wrote:
         | Those articles are coming from Bing as of right now. Our
         | offering is based on analyzing those articles and summarizing
         | them/picking out the most relevant parts. We definitely plan to
         | augment (and eventually replace Bing) with our own index.
        
           | closedloop129 wrote:
           | Have you considered using blacklists? You could cooperate
           | with Brave and their Goggles:
           | https://news.ycombinator.com/item?id=31837986
        
             | wayy wrote:
             | We have considered blacklists but haven't looked into
             | Goggles yet. Thanks for mentioning this
        
       | arkanane wrote:
       | I press search, get a 10sec progress bar loading and nothing
       | happens.
        
       | ForrestN wrote:
       | FYI: I clicked on get lucky, and went here:
       | https://beta.sayhello.so/search?q=Check+if+string+is+a+palin...
       | which for me in Safari is just an empty white page.
        
         | rushingcreek wrote:
         | Do you have Javascript enabled?
        
           | ForrestN wrote:
           | Yes
        
             | wayy wrote:
             | We've tested it on the latest version of Safari. Maybe try
             | on an updated browser?
        
               | ForrestN wrote:
               | It's totally up to date. From the console:
               | 
               | "TypeError: N.at is not a function. (In 'N.at(-1)',
               | 'N.at' is undefined)"
        
               | duderific wrote:
               | I get the same error in latest Safari.
        
           | mdaniel wrote:
           | While I'm not directly affected by this, a blank white page
           | is _always_ a symptom of careless error handling, even in the
           | case where the user has JS turned off. The  <noscript> tag
           | exists expressly to present information about your site's
           | need to have JS enabled
        
       | HPGBeans wrote:
        
       | treis wrote:
       | https://beta.sayhello.so/search?q=how+to+base64+encode+a+str...
       | 
       | Query: how to base64 encode a string in ruby
       | 
       | Response: I'm not sure what you mean by "base64 encode a string
       | in ruby" - that's a bit of a misnomer. Base64 encoding is a way
       | of storing data in a form that can be decoded by a human. It's
       | not a secure way to store data, but it's useful if you want to
       | send a message to someone who doesn't understand the language
       | you're using.
       | 
       | The right answer is in the third link provided but it's not
       | exactly correct.
       | 
       | Google gives back the Ruby Module Base64 docs as the first hit.
        
         | rushingcreek wrote:
         | Our index is based on Bing as of right now -- if they give us
         | low-quality results, our generated answers will be low quality
         | as well. We're definitely aware of this and are working on
         | developing our own index to augment Bing's in cases like this.
        
           | masukomi wrote:
           | while i assume there are good business reasons you're basing
           | your stuff on Bing, it's notable that as a general rule
           | developers don't use bing. In my experience the google
           | results are radically better.
        
             | mdaniel wrote:
             | > while i assume there are good business reasons you're
             | basing your stuff on Bing
             | 
             | I'd strongly suspect it's the same reason DDG does so: Bing
             | offers a search API, and Google has no incentive to offer
             | access to its index and results
             | 
             | There are plenty of companies that make a living scraping
             | results out of Google, so I don't mean to say it's
             | impossible, it's just a _monster_ amount of energy playing
             | cat and mouse with what is effectively an unlimited budget
             | to stop one from accomplishing that goal
        
       | snowstormsun wrote:
       | Nice!
        
       | bearjaws wrote:
       | Really curious how this worked, the query 'FHIR appointment spec'
       | produces the following. It actually did get the right result as
       | the first result.
       | 
       | "fhir appointment spec
       | 
       | I'm not sure what you're asking about, but I'll try to answer it
       | as best I can. __ Appointment is a FHIR data type. It's a way to
       | describe a time slot for a patient to be seen by a healthcare
       | provider. Appointments can be booked, cancelled, rescheduled, or
       | canceled and rebooked. It can also be used to describe the
       | location of the appointment. "
       | 
       | Pretty impressive summary given that it doesn't exist in any one
       | specific page.
        
         | rushingcreek wrote:
         | Thanks :) We've gone all-in on using our AI to answer questions
         | based on information from multiple sources.
        
       | vyrotek wrote:
       | Very interesting.
       | 
       | Some of my results with code examples looked awfully similar to
       | GitHub CoPilot output.
       | 
       | Is that being used to generate results sometimes?
        
         | rushingcreek wrote:
         | Thanks :)
         | 
         | We actually do have a code generation model similar to CoPilot
         | but it's not active yet on the backend. All of the code
         | snippets you see are pulled from other websites.
        
       | joshstrange wrote:
       | First Impressions:
       | 
       | * I won't use a different search engine for programmers stuff vs
       | everything else. So while this might be targeted toward software
       | developers I can't see myself using it unless it can handle
       | normal searches.
       | 
       | * UI/UX - I hate the progress bar, I'm not sure at all what it's
       | telling me as there are results shown while it's still
       | completing. The results are way too spaced out. On my 27" 2K
       | screen I can only see 3 results, the search bar takes up way too
       | much space and there is way too much padding on the results.
       | Don't move the DOM on me, removing the progress bar is jarring as
       | is "Was this answer helpful?" popping in, I'm here for results,
       | not to train your ML.
       | 
       | * Trackers - Using the default installs of Privacy Badger and
       | uBlock Origin meant no results ever loaded. I'm not sure what was
       | being blocked that caused the issue but cookies from bing [0] and
       | a request to cloudflareinsights.com [1] should not hamper showing
       | results.
       | 
       | Search is a tool and one that I need to be quick, simple, and
       | informationally dense. This checks almost none of those boxes.
       | I'm even open to using a different search engine (I semi-recently
       | switched from Google to Ecosia and it's been near-seamless), but
       | I don't see any "pro" to using this engine and I see a ton of
       | "cons".
       | 
       | [0] https://cs.joshstrange.com/V5uiyM
       | 
       | [1] https://cs.joshstrange.com/GkPuap
       | 
       | EDIT: I did a few more searches because I realized I wasn't
       | getting the "info box"/ML results on my first few searches and I
       | wanted to be fair. Sorry but that made me dislike this even more.
       | I really, really hate content moving out from under me. My eyes
       | start reading one of the 3 results that was shown then they got
       | pushed down for another overly-padded box that tried to "answer"
       | my question. The results were worse than "grab the selected
       | answer from the first SO that matches this query". Maybe it would
       | be better if that info was shown off to the side and didn't move
       | the results when it loaded in but again, I didn't find it useful
       | in the queries where it showed up.
       | 
       | EDIT2: I posted a follow up comment about what, specifically, I
       | think should be changed:
       | https://news.ycombinator.com/item?id=32005841
        
         | _tom_ wrote:
         | I definitely see the benefit of a separate code search engine
         | and would use it. Google has, as you have notice, gotten way
         | less useful. A big part of that is trying to target the wider
         | market to the exclusion of less popular searches, like
         | developers'. A dedicated engine would help with that.
         | 
         | I'm more than willing to open another tab to not have a search
         | result page full of YouTube videos.
        
           | joshstrange wrote:
           | To each their own. I have 1 flow for search which I don't
           | plan on changing and I personally have no issues with google
           | search (for code, technical, or otherwise). I still consider
           | it to be one of the best and I don't agree with the "google
           | search is getting worse" crowd. Maybe at some point I'll use
           | some kind of "search hydra" that hits multiple engines and
           | either combines the results or shows me the results based on
           | the type of search it thinks I'm doing but I don't imagine
           | I'll ever want to consciously switch engines based on task.
           | 
           | Also I've never seen a "search result page full of YouTube
           | videos" no matter what the query was. Sometimes there will be
           | 3 or a carousel of them near the top but I can easily ignore
           | that (assuming they aren't relevant or useful to me). I can't
           | remember the last time I got a video for a code/technical
           | query on google, I just did some testing and only a few
           | queries showed videos, always 3, always partway down the page
           | so that the search results at the top answered what I needed
           | before I even got to the videos.
        
             | danenania wrote:
             | In your first comment, you say you want high information
             | density. In this one, you say if there are irrelevant
             | results on Google, you'll just ignore them and scroll past.
             | 
             | I get that there's always a high bar to switch to a new
             | tool, and Google obviously has certain advantages that are
             | tough to replicate, but it seems like you're applying a bit
             | of a double standard. If Google shows me a bunch of ads and
             | irrelevant results that I have to parse through to find
             | what I'm looking for, that's not high information density;
             | it's quite the opposite.
        
               | ziddoap wrote:
               | Not the parent poster, but I think there is an important
               | distinction between information density and information
               | relevancy. Google is information dense (compared to
               | this), but not all of it is relevant. This search engine
               | aims for higher relevancy, but the density suffers from
               | the stylistic choices of the creators.
        
               | rushingcreek wrote:
               | This is exactly what we think. And yes, we definitely
               | plan to do better with space efficiency
        
               | mbreese wrote:
               | With one dominant search provider, we are constantly
               | conditioned to parse the results from Google. It is very
               | easy to ignore certain parts of the page and mentally
               | process which parts of the page your brain is interested
               | in.
               | 
               | So, while it's not necessarily "information dense",
               | finding the information you need is comparatively
               | "cognitively light". Or at least predictable...
               | 
               | The devil you know...
        
               | rushingcreek wrote:
               | You're right -- we are all conditioned by Google. Yet,
               | using Hello for my own technical searches these past few
               | weeks, higher signal and lower noise is much better for
               | me personally. At the end of the day, Justin and I are
               | making the search engine that we want to use ourselves as
               | developers.
        
               | danenania wrote:
               | Fair point, but I think the technical term for that is
               | "Stockholm Syndrome". Or perhaps in this case, we should
               | call it "Mountain View Syndrome".
        
               | joshstrange wrote:
               | Both things can be true and you are omitting/ignoring
               | part of my comment about irrelevant results. Higher
               | density of information means I can scan faster and see
               | more with less/no scrolling. As for irrelevant results I
               | called out that the relevant results were above the
               | videos (irrelevant, to me in this context, content).
               | Lastly, I very rarely get ads when doing technical
               | searches so that doesn't really factor in here. I'd wager
               | I get what I want within the first 3-4 results on google
               | reliably (at least when it comes to
               | technical/programming-related searches) and when I don't
               | it's normally super niche (it also rarely has videos or
               | other stuff in the results, think: error messages).
        
               | rushingcreek wrote:
               | Right, but you still have to click on those links
               | manually and potentially scan lots of text yourself for
               | relevancy. Our goal is to automate that.
        
               | joshstrange wrote:
               | I understand that, unfortunately your goal is still in
               | the future (be it a week, month, year, or decade I can't
               | tell you but it's not there today). In the interim (and
               | if you want users while you refine) your search results
               | should at least be on par with Google/Bing/etc. That way
               | the "happy path" is your ML spits out the right answer
               | and no links need to be clicked but if your logic can't
               | come up with an answer or if it comes up with the wrong
               | one you need the results to be a viable fallback.
               | 
               | EDIT: Building on what I said:
               | 
               | I use Github CoPilot and have been very happy with it.
               | It's far from perfect and even when it spits out good
               | code I have to do minor cleanup but it does save me time
               | and "sparks joy" when it works. When it doesn't work it
               | doesn't really get in my way. If CP required I change my
               | entire method of programing, IDE, or if I had to go into
               | a "special mode" to use it then it would be next to
               | worthless to me.
               | 
               | As it stands you don't have a good fallback (regular
               | results). Your product should be additive to what
               | currently exists in the space. Not "a step forward if we
               | guess the correct answer and a massive step backwards if
               | we don't". I 100% believe you can make changes such that
               | the results function as a perfect fallback (I've outlined
               | them in various places of this thread).
        
         | richardsocher wrote:
         | We learned many of these lessons at https://you.com/code:
         | 
         | * we also needed to build a strong "everything else" search
         | engine and then
         | 
         | * have great results for coding with specific search apps like
         | StackOverlfow, AI code complete, ++
         | 
         | * be very fast (we messed that up when we first launched)
         | 
         | * have great scores on Privacy Badger, be compatible with
         | uBlock, etc.
         | 
         | Last week we've started opening up our platform to collaborate
         | on results with outside developers and have gotten a lot of
         | interest: https://about.you.com/developers/
         | 
         | Maybe we can collaborate also with you guys at sayhello. Ping
         | me at hey@you.com if you want to compare notes.
        
         | visarga wrote:
         | I think you missed the forest for the trees... it is a Q&A
         | system not just a search engine. You can talk to it, you can
         | refine your queries.
         | 
         | To the SayHello team: kudos for being faster than Google to
         | release a Q&A+search system. I was expecting something like
         | this for a couple of years wondering why Google was sleeping on
         | its mountain of papers and not doing it.
         | 
         | Search was the first step in finding information, Q&A is the
         | next logical step. Language models+search such as DeepMind
         | RETRO have shown this approach to be very efficient: 25x
         | reduction in model size for the same perplexity and verifiable
         | correct answers with source document references.
         | 
         | In the future I expect search to become more like an assistant
         | with context and language abilities. Retrieving a bunch of web
         | pages is so 2000's. Q&A is especially relevant for mobile use
         | with speech interface (hello Siri and Google Assistant).
        
           | joshstrange wrote:
           | > I think you missed the forest for the trees... it is a Q&A
           | system not just a search engine. You can talk to it, you can
           | refine your queries.
           | 
           | From the creators:
           | 
           | > We're building a better search engine for software
           | developers.
           | 
           | Also no you can't "talk to it", I'm not sure where you got
           | that idea. It has a "Ask a follow up" but that performs a new
           | search with none of the context of your previous search (also
           | this UI of sliding a modal up from the bottom and layering
           | the results is terrible).
           | 
           | > Search was the first step in finding information, Q&A is
           | the next logical step.
           | 
           | And we are clearly not there. Not only does this not allow
           | you to ask follow-ups to refine but it doesn't give good
           | results in my testing.
        
             | rushingcreek wrote:
             | Could you post which examples you tried? It's not perfect,
             | but the "Ask a followup" feature is usually smart enough to
             | use existing context to refine.
        
               | joshstrange wrote:
               | I asked a question with "php" in the search then asked a
               | follow up without including "php" in the search and it
               | gave me python results. Also, and I'm sure this is a code
               | formatting issue, the PHP code is invalid (newlines
               | appear to be missing, the important one being after the
               | opening tag) [0]
               | 
               | [0] https://cs.joshstrange.com/RP4B59
        
               | rushingcreek wrote:
               | I don't see a followup in the screenshot you provided.
               | Again, we're definitely not perfect with followups, but
               | we generally do capture context. It would be very helpful
               | if you could tell us your original search query, your
               | followup, and your intent with this question.
        
               | joshstrange wrote:
               | That wasn't what I searched for and did a follow, here is
               | a repeat of the original and follow up I tried (or as
               | best as I can remember, I know it actually showed me
               | python code last time, now it's showing SQL and python
               | results).
               | 
               | Original (php get end of day timestamp):
               | https://cs.joshstrange.com/50y9Pf
               | 
               | Follow up (end of month timestamp):
               | https://cs.joshstrange.com/hStUEm
               | 
               | Sidenote: This slide over modal (that hides the results
               | of the first query, you cannot get to them once you do a
               | follow up) has got to be the worst of all worlds. It's
               | very unintuitive, doesn't provide any value, makes your
               | initial query only as useful as what you can see "before
               | the fold". Just redirect to the new results or append
               | them under. This modal is frustrating to work with, it's
               | a weird parallax-type thing:
               | https://cs.joshstrange.com/d6rhPZ
        
         | rushingcreek wrote:
         | Thanks for the feedback. We agree that speed is incredibly
         | important, and we're working on making searches much faster.
         | We'll be iterating on the UI/UX as well, as we think that we
         | can definitely do better and be more efficient with space.
         | 
         | I'd love to hear more about what you mean by "informationally
         | dense" -- some search engines simply show more information on
         | the results page, but that doesn't make results inherently
         | better in my opinion because it frequently simply increases
         | noise relative to signal.
         | 
         | Our current approach is to provide only the most relevant
         | answers/code snippets and nothing else (high signal with low
         | noise) as opposed to cramming in every Stack Overflow answer we
         | can find. We realize we still have a long way to go to make it
         | magical for every search, but we're working on it.
        
           | detaro wrote:
           | > _some search engines simply show more information on the
           | results page, but that doesn 't make results inherently
           | better in my opinion because it frequently simply increases
           | noise relative to signal._
           | 
           | But if the "automatic" answer fails and I need to skim
           | results, as I'll often need to do, you put 3 result previews
           | in a space DDG and Google fit 5. They also apply reasonable
           | defaults for the max line length - a basic typography thing
           | that improves quick readability a lot.
        
           | joshstrange wrote:
           | Consistent, non-moving (as things load) UI is super
           | important. Also something that bugged me but I didn't realize
           | why until now: don't use the full width for the description
           | under links (or the titles for that matter). We've known for
           | some time now that if you make text too wide it becomes
           | harder to read.
           | 
           | My suggestions:
           | 
           | * Kill the padding/margins, it's pretty for demos or certain
           | cases but I want to be able to see more information, heavy
           | padding/margins have no place in search results.
           | 
           | * Shrink the search bar to the upper left like every other
           | search engine. Keeping it centered with tons of padding
           | wastes space. Take your logo and put it to the left of the
           | search field, take the buttons and put them to the right. On
           | my screen you are burning a little over 500px of vertical
           | space with things that don't matter, the results matter.
           | 
           | * Shrink your "regular" search results to be half the width
           | of the screen (on desktop, something like a max of 700,
           | Google uses ~640 as does Ecosia). Use the space to the right
           | to show your AI/ML results. This means no content will jump
           | around and people can more easily read the results, full-
           | width is very hard to read. Also shorten the "description"
           | under the links. 2 lines max (at 640px width).
           | 
           | * Either don't ask "Was this answer helpful?" (use hints
           | like: Did they click the link? Did they leave the site after
           | seeing the results?) OR don't make it move the content (hold
           | the space empty if you must animate it in, just don't let the
           | content shift multiple times after doing a search).
           | 
           | Here is your default result for "this is a test" search
           | query: https://cs.joshstrange.com/oKbz6G
           | 
           | Here it is with a bunch of padding/margins removed:
           | https://cs.joshstrange.com/VEVXGh
           | 
           | Yes, I removed the logo/buttons because that was faster than
           | moving them to the left/right of the search but the end
           | result is the same. In my tightened up version you can fit 8+
           | result links where the initial version could only show 3,
           | also all the results are easier to read.
        
             | rushingcreek wrote:
             | Thank you, the cleaned up version is helpful. We definitely
             | have a lot of UX work to do :)
        
         | hbn wrote:
         | Also the scraped snippet appears at the top of the results a
         | couple seconds after the results load and it causes all the
         | results below it to suddenly jerk lower on the page
        
         | discreteevent wrote:
         | Lowest common denominator and "one click" is killing the
         | internet for me. A lot of times I am the lowest common
         | denominator and so that's fine. But when I am a specialist I
         | want something that lets me be specific and gives me specific
         | results.
        
           | visarga wrote:
           | How much more specific do you need than having the ability to
           | refine your question iteratively? I think regular search
           | engine only allow for a few special keywords. Here you can
           | use natural language to refine.
        
             | discreteevent wrote:
             | Sorry, I meant this in response to the part of the parent's
             | comment about not wanting a specific search engine for
             | code. To make it clear - I could see myself using this
             | engine.
        
         | jerrysievert wrote:
         | a search for v8 gave me:
         | 
         | * juice
         | 
         | * v8 engine
         | 
         | * juice
         | 
         | * v8 engine
         | 
         | * juice
         | 
         | so definitely some non-programming searches showing up,
         | unfortunately none of the documentation sources for v8.
        
           | joshstrange wrote:
           | Yep, I saw non-programming stuff but my worry is, if they are
           | branding themselves as "better search engine for software
           | developers", that programming-stuff will be weighted higher
           | than non-programming stuff even if the search term has
           | nothing to do with programming (or a tenuous link). Though
           | your search examples seem to prove the exact opposite.
           | 
           | All that said the UI/UX is too frustrating to use (as-in)
           | even if they don't promote programming content over non.
        
         | NegativeLatency wrote:
         | UI: Feels overpadded to me, I'd like to be able to see more
         | stuff without scrolling so far
        
         | wayy wrote:
         | To comment on the dynamic DOM - we're displaying up to three
         | answer types (text, code, links) for each question. We're
         | loading them independently to get information to the user as
         | fast as possible. The alternative (in this state) is to have
         | all of them wait until the slowest component finishes. We're
         | still in the early stages of development, so either way it's
         | not going to be perfect. I can see how this can be a poor
         | experience for some - we're working on it.
        
           | visarga wrote:
           | Maybe not everyone understands that the Q&A responses have to
           | go through a large language model before they are displayed.
           | This takes time, showing something while the LM is churning
           | away is a good idea.
        
             | wayy wrote:
             | Yep, the current loading bar attempts to do that (replaced
             | by the slowest generated answer when available). Definitely
             | looking into new ways to improve speed + loading experience
             | if necessary.
        
           | joshstrange wrote:
           | Side-by-side is the best way to handle this. Show results on
           | the left and load in the ML stuff on the right after it
           | loads. This prevents content-jump and makes the results less
           | wide (you want to aim for <700px to be more readable).
        
           | unsafecast wrote:
           | A placeholder empty box that gets populated when the content
           | arrives would be an improvement.
        
         | 8organicbits wrote:
         | > * I won't use a different search engine for programmers stuff
         | vs everything else.
         | 
         | I'd use this as a ddg bang[1]. I don't use them often, as ddg
         | is a great search engine, but some search engines handle
         | certain queries better and ddg lets you route queries
         | efficiently.
         | 
         | https://duckduckgo.com/bang
        
       | candiddevmike wrote:
       | How do you think your product will fare in the wake of the
       | backlash and legal saber rattling against GitHub Copilot?
        
       | mikkergp wrote:
       | One of your examples at the bottom of the page, is the
       | "IsPalindrome". I guess the assumption is I would click on the
       | "see Reference" as it doesn't provide any context for the code.
       | This context of a real person explaining it is one of the
       | benefits of sites like Stack Overflow, so I would think about
       | this element of UX.
       | 
       | Also, I noticed in your palindrome reference example, it didn't
       | choose the accepted answer from Stack Overflow. How did it choose
       | the example? Also, the 2nd 2 reference panes, I can't tell what
       | value they are adding. They seem like a list of random outputs of
       | the ispalindrome script.
        
         | rushingcreek wrote:
         | Our code ranking algorithm uses an NLP model (trained on a code
         | dataset) to pick the most relevant snippets. You're right that
         | the accepted answer on Stack Overflow is a good heuristic, and
         | it's something we'll add to our ranking algorithm in the near
         | future.
         | 
         | Showing an answer written by a human as a part of the code
         | snippet is also a good idea.
        
       | servercobra wrote:
       | This suffers from one of the things I've been hating about Google
       | lately: it doesn't use exactly what I typed in. Case in point:
       | "react-native-navigation" is an entirely different package than
       | "react-navigation". I query about RNN and get results about RN. I
       | get this is due to Bing, but it could be a fundamental flaw with
       | the approach (for me at least)
       | 
       | The animations and page jumpiness are a bit off-putting and slow,
       | but it is a beta!
        
         | rushingcreek wrote:
         | Yep, we feel that frustration -- it's our intention to use
         | exactly what you typed in, but here Bing is messing up. Using
         | results from our own index should make this less of an issue in
         | the future.
        
       | cpcat wrote:
       | It says start typing to search. So i started typing and it didn't
       | search. I really expected it to be some sort of typeahead search
       | without requiring focus on the the search field :)
        
         | rushingcreek wrote:
         | Query autocomplete is on the roadmap :)
        
           | allanrbo wrote:
           | The odd thing to me was not missing autocomplete, it was that
           | it says "start typing to search", but when you type, nothing
           | happens. This is of course because the search field is not
           | focused when you load the page, until you click it. It should
           | maybe be "type here to search".
        
             | wayy wrote:
             | Ah, I see what you mean. We'll look into focusing the
             | search field or changing the text to be less ambiguous.
        
             | mdaniel wrote:
             | That shit drives me absolutely crazy when the _one thing_ a
             | user wants to do is interact with a search field, but the
             | page doesn 't choose to put focus on the input
        
         | [deleted]
        
       | sailorganymede wrote:
       | Personally I've never really had an issue with Google - I think
       | my mental model with how it works is to the point it makes sense.
       | 
       | It would be amazing if this could be used for internal
       | documentation however. Like we have so much documentation on our
       | wiki which is just disorganised.
        
         | 8n4vidtmkvmk wrote:
         | Stack overflow offers a version for companies. I've never used
         | it, but it sounds like what you might want
        
       | Fede_V wrote:
       | You.com has a similar code search specific product. What do you
       | plan to offer that they don't?
        
         | rushingcreek wrote:
         | There are a few key differences. The main one is that our AI
         | generates an explanation to answer your question directly; we
         | go significantly further than they do in terms of synthesizing
         | explanations. Same goes for snippets; on You.com, you usually
         | need to click on a button (.e.g "Open Side Panel") to get a
         | code snippet which adds friction. Furthermore, they seem to be
         | simply showing the full Stack Overflow page; our approach is to
         | find and rank the most relevant code snippet while offering a
         | "See Reference" button to make it easy to go to the original
         | page.
         | 
         | Overall, our goal is to have the highest signal-to-noise ratio
         | of any search engine when it comes to developer searches.
        
       | 8organicbits wrote:
       | On mobile browser I noticed that autocorrect was enabled, which
       | isn't great for technical inputs. I want "hcl", not "hello".
        
         | wayy wrote:
         | Good catch, we'll disable it
        
       | laumars wrote:
       | I'm seeing the same page as result 1, 2 and 3. Interestingly only
       | 1 out of those 3 results were scraped from that page. Even more
       | curiously only 1 out of those 3 results were even valid code.
       | 
       | https://beta.sayhello.so/search?q=hello+world+in+brainfuck
       | 
       | Nice idea for the project though. Good luck with it
        
         | rushingcreek wrote:
         | Our code extraction/ranking model hasn't been trained on that
         | language yet, so it's definitely an out-of-domain example.
         | We'll keep working on expanding our repertoire!
        
           | laumars wrote:
           | Ahh that's fair enough. I think it's fair to say "brainfuck"
           | is outside most peoples domain. I was just curious how your
           | search engine performed on less common search queries (the
           | kind that are trying to debug a rarely hit problem with an
           | otherwise popular framework or language and thus you often
           | spend hours digging through irrelevant answers before you
           | find that one blog post that solves it) but couldn't think of
           | a more realistic example off the top of my head.
        
       | abalaji wrote:
       | Interesting--seems you have to retrain your "google-fu"
       | 
       | "meta programming python" does not give as good results as
       | 
       | https://beta.sayhello.so/search?q=meta+programming+python
       | 
       | "how to implement a meta class in python"
       | 
       | https://beta.sayhello.so/search?q=how+to+implement+a+meta+cl...
        
         | rushingcreek wrote:
         | Yep, the AI definitely prefers fully formed sentences as it is
         | right now. We know it's important not to force users to change
         | how they word their queries, so making it less sensitive for
         | phrasing is a priority for us.
        
         | Invictus0 wrote:
         | Searching "meta class python" gives better results, which seems
         | reasonable to me.
        
       | aquajet wrote:
       | > Training a sequence-to-sequence language model (T5 derivative)
       | on our custom dataset designed for factual generation yielded
       | much better results with less hallucination.
       | 
       | Could you elaborate more on this or point to a paper/benchmark
       | results?
        
         | rushingcreek wrote:
         | I'd be happy to talk a bit about how we evaluated the model.
         | The task we're performing is fundamentally long-form question
         | answering (LFQA), and recent papers
         | (https://arxiv.org/pdf/2103.06332.pdf) have shown that metrics
         | such as ROUGE (used for the KITE benchmark) aren't great at
         | evaluating the quality & truthfulness of generated answers. On
         | our dataset, our approach is to use a combination of human
         | evaluation (which is still arguably the most reliable metric
         | used by the NLP research community to evaluate generated answer
         | quality) and an entailment score (checking if a generated
         | answer is consistent with a "ground truth" context document).
        
       | izolate wrote:
       | Congrats on the launch! Looks promising, so I'll try it out for a
       | couple of days.
       | 
       | One feature request at first glance: please default to the system
       | font stack for code snippets. I see you're currently using
       | Consolas, a Microsoft typeface, which is not pleasant to see as a
       | mac user.
       | 
       | You can use this to default to the system font on every platform:
       | font-family: "SF Mono", "Monaco", "Inconsolata", "Fira Mono",
       | "Droid Sans Mono", "Source Code Pro", monospace;
        
         | FractalHQ wrote:
         | Why do you consider it unpleasant? I'm a mac user and I really
         | like Consolas. I like to use it in VSCode or when building
         | websites that display code blocks.
        
         | rushingcreek wrote:
         | Thanks for the feedback, we'll take a look at that :)
        
       | ziddoap wrote:
       | Maybe someone from the team can help me with a question regarding
       | the privacy page/policy.
       | 
       | > _The searches we anonymously log be used to improve our
       | product._
       | 
       | > _Your data will not be shared with any third party unless we
       | are required to respond to subpoenas, court orders, or legal
       | process, to establish or exercise our legal rights or defend
       | against legal claims._
       | 
       | > _We will never sell your data to any third party._
       | 
       | The first sentence is at odds with the second two.
       | 
       | If you say you are only collecting query and query response data,
       | but then assure me you aren't selling my data, I can't help but
       | wonder which is true:
       | 
       | 1) The site is actually collecting data, but is not selling it.
       | 2) The site is not collecting data, and the privacy page is
       | outdated/incorrect/inaccurate.
       | 
       | The same question obviously applies to court orders and the like.
       | Which is it? Do you store data that would be material to me if
       | the site was presented by court order, and that is why you have
       | given me a disclaimer? Or do you not store data, and so the
       | disclaimer is meaningless?
        
         | wayy wrote:
         | We only collect anonymous queries + explicit feedback provided.
         | No data is material to you. The privacy page was made pretty
         | early on - we'll update it to be more precise. Thanks for
         | pointing this out.
        
       | radiojasper wrote:
       | I searched for 'iife javascript' and although it understood what
       | I was searching for, it's 3 code examples only showed an iife on
       | the 3rd code example. the 2nd one didn't make sense at all [0]
       | 
       | [0] https://beta.sayhello.so/search?q=iife+javascript
        
         | wayy wrote:
         | Thanks for trying it out - we're still pretty early so the code
         | search feature won't be perfect. The model works best with
         | natural language queries, however, so you might find better
         | results by giving the search engine more to work with.
         | 
         | https://beta.sayhello.so/search?q=Immediately+Invoked+Functi...
        
         | [deleted]
        
       | gbro3n wrote:
       | Thought I'd try this on a problem I've been researching today
       | (which I resolved) where my service worker for offline PWA usage
       | was working for everything except audio files.
       | 
       | I searched the following in say hello.so.
       | 
       | "Service worker fails on request for audio file"
       | 
       | I got back a couple of results related to general service worker
       | use but none that get close to discussing the core problem that
       | lead to the solution.
       | 
       | The same query in Google returns several results that together
       | pointed me to the solution (it was around range headers in
       | requests for media data types).
       | 
       | This is just one example though. I think the problem you are
       | trying to fix is worth the effort. I just wonder if this is where
       | humans are still stronger than computers - gathering unstructured
       | data to use in problem solving.
        
         | wayy wrote:
         | The description of the steps you took is super helpful feedback
         | - thanks! Hello performs best on "how-to" questions at the
         | moment. We're still working to improve troubleshooting type
         | queries.
        
           | gbro3n wrote:
           | No problem. Good luck with the project.
        
           | CodeSgt wrote:
           | That'll be a difficult adaptation for potential users to
           | make. I think most of us have been conditioned to phrase our
           | queries a certain way to achieve the best results from
           | Google.
           | 
           | Then again maybe that's just me.
        
             | wayy wrote:
             | You're right in that moving away from Google is a huge
             | behavioral change. While our model does prefer natural
             | language queries, what I was really referring to are the
             | different categories of developer searches. Ad-hoc how-tos
             | are a type of search where the developer knows what to do
             | but not exactly how e.g "how to set a cookie in fastapi."
             | Troubleshooting searches are things like copy-pasting a
             | compile error, or "why is X behavior not working."
        
       | llaolleh wrote:
       | Can you give an example of using the context of a previous query?
       | 
       | I applaud you for trying to make a new search engine - it's not
       | something sane people would try to do because of a certain
       | behemoth eating everyone's lunch. It's going to take
       | extraordinary insight and out of the box thinking to get
       | something really good.
        
         | wayy wrote:
         | Our initial prototype focused on conversational search, with
         | the ability to ask follow-up questions.
         | 
         | Here's a rather trivial example: Q: "Who founded Y Combinator?"
         | A: "Paul Graham founded Y Combinator with Jessica Livingston
         | and Trevor Blackwell."
         | 
         | If you scroll to the bottom of the answer page to ask a follow-
         | up question: Q: "How old is he?" A: "Paul Graham is 57 years
         | old. He founded Y Combinator in 2005."
        
       | mmmuhd wrote:
       | Nice project. FYI Search Returns empty page on UC browser mobile.
        
       ___________________________________________________________________
       (page generated 2022-07-06 23:00 UTC)