[HN Gopher] Compare Google, Bing, Marginalia, Kagi, Mwmbl, and C...
       ___________________________________________________________________
        
       Compare Google, Bing, Marginalia, Kagi, Mwmbl, and ChatGPT
        
       Author : 882542F3884314B
       Score  : 788 points
       Date   : 2023-12-31 02:32 UTC (20 hours ago)
        
 (HTM) web link (danluu.com)
 (TXT) w3m dump (danluu.com)
        
       | cratermoon wrote:
       | "Going back to the debate between folks like Xe, who believe that
       | straightforward search queries are inundated with crap, and our
       | thought leader, who believes that "the rending of garments about
       | how even google search is terrible now is pretty overblown", it
       | appears that Xe is correct."
       | 
       | Also, the article tested Mwmbl as well, not mentioned in the
       | title here.
        
       | coldcode wrote:
       | Search was the biggest feature of the web in the early '00s. Now
       | it's such a mess. I can't imagine Search will ever be amazing
       | again, given all the complexity of providing quality while still
       | avoiding all the crap.
        
         | packetlost wrote:
         | Search probably hasn't changed much, but the internet is very
         | different.
        
           | jader201 wrote:
           | Yeah, the problem is that there is so much low quality
           | content, that search doesn't (or can't) do a good job of
           | surfacing it above the noise. There is still some signal
           | left, but it's such a small fraction that it's much more
           | difficult to filter it out.
           | 
           | Having said that, I'm usually still able to find what I'm
           | looking for, if I know that it likely exists, and know the
           | keywords to use to find it. But it's much harder nowadays for
           | sure.
        
             | amlib wrote:
             | I wonder how much influence google had in lowering the
             | content quality over the years? After all, most SEO spam
             | was a direct response to all the ludicrous requirements
             | they've forced the whole web into, which eventually only
             | SEO spam were willing to commit to.
             | 
             | I also wonder if google just stopped existing, would the
             | web heal over time?
        
             | genewitch wrote:
             | i have a radio that can "hear" down to -130dBm, i've proven
             | this empirically. Cellular signals work at -12dB or more
             | below the noise floor, wspr works even lower than that.
             | Lightning is broadband noise, and yet i can still use
             | digital stuff when there's lightning storms.
             | 
             | I don't buy the signal to noise argument. For example,
             | whenever i get on youtube and get fed some content, i can
             | immediately tell if it's had AI involved anywhere, and
             | thumbs it down. I won't recommend it, i've called people
             | out for linking such tripe to me (or others).
             | 
             | Hear me out - google got bad about 11 years ago when the
             | dorking stopped being effective, right around the time of
             | the spotlight search results and the sponsored junk taking
             | the top results. Around this time, various agencies (news,
             | etc) started gaming the SEO to respond to any remotely
             | related search with whatever the news was currently. Google
             | chose not to "fix" this, because we're not the customer.
             | DDG was better for a few years for real results, too, but
             | that has gone downhill as well.
             | 
             | The current zeitgeist uses stuff like tiktok and facebook
             | for "web searches" - "food trucks near Austin, TX" or so.
             | No one really uses web search like people on this site do,
             | and google couldn't care less if we don't like the search
             | results.
        
         | Libcat99 wrote:
         | Is it actually more complex to provide good results, or is it
         | just more profitable not to?
         | 
         | I have a hard time believing an organization like Google
         | doesn't have the resources to provide a search engine that's
         | just as usable as what they had 6 years ago (around the time I
         | feel like the decay really set in). Seems a lot more likely
         | that it's just more profitable to serve up garbage sponsored
         | content.
        
           | jmye wrote:
           | Definitely more profitable not to. Especially as Google is an
           | ad company, not a search company.
           | 
           | I'd rather see a world with numerous paid/subscription search
           | engines, that are motivated to do nothing but return search
           | results well. I expect you would see some of the SEO crap
           | getting solved.
        
             | w-ll wrote:
             | i cant remember where i read this, but something about how
             | google ranks site that have google ads higher than sites
             | that dont. makese sense, its evil, but makes sense, thats
             | why we get all this scrapped spam. is there any more info
             | on this?
        
               | mixdup wrote:
               | this is like focusing on one single problem as being the
               | cause of the decline of the United States. It's actually
               | a lot of things combining and there's not going to be one
               | fix
        
               | w-ll wrote:
               | wtf decline of the United States are you talking aboout
        
               | Nextgrid wrote:
               | Intentionally ranking sites with Google ads higher would
               | be a huge antitrust liability, so no way they're doing
               | that.
               | 
               | On the other hand, they can achieve virtually the same
               | outcome while keeping plausible deniability by just not
               | doing anything that would downrank sites with ads (of
               | which a significant chunk is likely to be Google's).
               | 
               | Spam sites often include ads.
        
               | w-ll wrote:
               | i dont think they public disclose that fact
        
               | Nextgrid wrote:
               | It doesn't need to be public to become an antitrust
               | liability. Internal written material can still come up
               | during discovery, potentially even in unrelated cases.
               | 
               | Therefore the safest option is to never openly discuss it
               | or intentionally do it and instead use other means to
               | achieve the objective (don't intentionally rank spam
               | higher, just defund/cancel any projects that would make
               | it rank lower).
        
           | zihotki wrote:
           | Google, or Alphabet, is not a search emgine company. It's an
           | ads company and that's what they are optimizing for.
        
         | Night_Thastus wrote:
         | The problem is that even if providers of the service are 100%
         | trying to provide a great service, everyone on the web will
         | always be min-maxing to appear on top.
         | 
         | So it's inevitably going to become crap.
        
         | jmclnx wrote:
         | To me it is only due to the ads, google and bing return nothing
         | but ads on the first page. Plus for me to have the joy of
         | seeing these ads, I need to got through a CAPTHA that I need to
         | try multiple times.
         | 
         | But all in all, a very good article
        
         | 1970-01-01 wrote:
         | The golden era of search results is very much over. Welcome to
         | the pot-metal era.
        
       | endisneigh wrote:
       | I reckon these days search is pretty difficult and everyone knows
       | how to game it. I recommend using a search engine that lets you
       | effectively change which sites are shown. You can do this with
       | Kagi, or with Google's Programmable Search Engines - I'm sure
       | there are more too.
       | 
       | In particular I block Youtube, not because they aren't sometimes
       | correct, but because I don't want videos polluting the regular
       | results - it just takes too long to get info from videos.
       | 
       | An ability to upvote results for a given query seems tantalizing
       | but I bet it would be gamed too. The DIY approach seems to be the
       | only tractable one.
       | 
       | In my case I only only results from domains I believe are
       | correct. The whitelist approach does have downsides. Usually I'll
       | vet new potential domains through social means like Reddit and
       | this site, rather than identifying them through the search
       | results. I believe there's an inherent tradeoff between
       | discoverability and the gameability of the results.
       | 
       | Though I do sympathize with folks who reminisce about 2008 Google
       | Search results, there were probably orders of magnitude less
       | content out there and a complete ignorance to how valuable your
       | place is on your business and thus no SEO.
       | 
       | I also personally disagree that yt-dlp is the "correct" result
       | for the average user when they search Youtube Download. I highly
       | doubt the average user would know or care to use the command
       | line. A website front end would be more actionable for them.
        
         | kristopolous wrote:
         | I'm a big fan of the non commercial site search engines because
         | of the gaming aspect. If you're not generating revenue from the
         | clicks the game mostly goes away.
         | 
         | I'm not saying people aren't entitled to make some money, but
         | it clearly incentivizes user hostile behavior.
         | 
         | Maybe make it an option because legitimate sites like
         | journalism also use this model.
        
           | Renaud wrote:
           | Subscription model like Kagi seems to work pretty well
           | against gaming the results.
           | 
           | Their only remaining incentive is to be good enough that
           | people keep paying for the service.
        
             | Nextgrid wrote:
             | It works not because they're somehow smarter or have more
             | resources than Google at detecting spam/SEO, it's because
             | unlike Google (and other ad-supported search engines), they
             | make money from result quality and have an interest in
             | blocking spam.
             | 
             | Google on the other hand makes money off ads (whether on
             | the search results page itself or on the spam sites), so
             | spam sites are at best considered neutral and at worst
             | considered beneficial (since they can embed Google
             | ads/analytics, and make the ads on the search results page
             | look relatively good compared to the spam).
             | 
             | Black-hat SEO has been around since the early days of
             | search engines and they managed to keep it at bay just
             | fine. What changed isn't that there was some sudden
             | breakthrough in malicious SEO, it's that it was more
             | profitable to keep the spammers around than to fight them,
             | and with the entire tech industry settling on
             | advertising/"engagement" as its business model, the risk of
             | competition was nil because competitors with the same
             | business model would end up making the same decision.
             | 
             | The same reason is behind the neutering of advanced search
             | features. These have nothing to do with the supposed war on
             | spam/SEO, so why were they removed? Oh yeah because you'd
             | spend less time on the search results page and are less
             | likely to click on an ad/sponsored result, so it's against
             | Google's interests and was removed too.
        
               | ec109685 wrote:
               | Kagi works because there is no incentive for SEO
               | manipulators to target it since their market share is so
               | small.
               | 
               | Super tinfoil hat to believe Google wants to send users
               | to blog spam websites (e.g. beneficial to Google).
               | 
               | Anytime there is money to be made, there is an
               | effectively infinite amount of people trying to game the
               | system.
        
               | lanstin wrote:
               | Google is a complex system so "want" can just include we
               | are making money from the blog spam and while we don't
               | like it other things take priority over fighting it as
               | effectively as we could.
        
               | ZeePelli wrote:
               | It's never tinfoil-hat to assume that a corporation is,
               | at very least, making sure not to fight too hard against
               | any activity that brings it more revenue.
        
             | whakim wrote:
             | But the author tried Kagi and the results don't appear to
             | be noticeably different, filled with scammy adspam just
             | like Google and Bing. Kagi's results seem to mostly
             | aggregate existing search engines [1], so this isn't much
             | of a surprise. Perhaps a subscription-based service that
             | operates an index at Google's scale might help, but no such
             | thing exists to my knowledge.
             | 
             | [1] https://help.kagi.com/kagi/search-details/search-
             | sources.htm...
        
               | greggh wrote:
               | Right, but Kagi has built in tools to make it easy to fix
               | that. Blocking those spammy sites from ever showing up
               | again. Moving certain sites up the ranking, and so on.
               | These features mean that over time my Kagi results have
               | become nearly perfect for myself.
        
               | whakim wrote:
               | This is addressed in the article. As Hacker News readers
               | and expert computer users, we have a bag of tricks that
               | we can reach into in order to make our searches perform
               | better. With a similar level of effort and an expert
               | user's intuition you can get good results out of any
               | search engine. Not so for the average user. In fact,
               | again paraphrasing the article, Google's original claim
               | to fame was that you _didn 't_ have to spend a lot of
               | time doing exact keyword matching and fancy tricks in
               | order to get good results.
        
         | ysavir wrote:
         | > In particular I block Youtube, not because they aren't
         | sometimes correct, but because I don't want videos polluting
         | the regular results - it just takes too long to get info from
         | videos.
         | 
         | Funnily enough, lately I've been prioritizing YT videos more
         | when searching. So many sites now are just regurgitated SEO
         | farms with minimal quality, and easy to see why: it's minimal
         | effort to produce and cheap to host. But making a video takes
         | time and effort, so has a much higher barrier to use as a click
         | farm.
         | 
         | More than once when traditional search failed me, I went to YT
         | and found some video from 2009 clearly and eloquently
         | explaining what I'm looking for in detail, and without any
         | distractions because the person authoring the video clearly
         | didn't specialize in the media format or show interest in
         | experimenting.
         | 
         | I've found it to also be a better source when looking a product
         | to buy. Want to know which fan to get? Turns out there's a
         | channel from a dedicated guy who keeps finding ways to test
         | different fans and their utility and with multiple videos
         | demonstrating his approach and findings. The mainstream
         | channels aren't all that useful, but there's a ton of "old web"
         | style videos (some even recent) passionately providing details
         | for almost anything you'd think to search. And they're a gold
         | mine.
        
           | robrenaud wrote:
           | Would a browser feature that skipped to the relevant parts of
           | the video based on closed captioning and understanding search
           | intent be useful? It seems like this would be a good way for
           | Google to fight to stay relevant in UX vs having the chat
           | bots just quickly spitting out a readable answer. Hunting
           | through ad laden webpages is annoying. Seeking to the
           | relevant section of the video is a solvable problem,
           | especially for videos above some viewership threshold.
        
             | nulld3v wrote:
             | I've definitely seen Google do this already:
             | https://searchengineland.com/google-tests-suggested-clip-
             | sea...
        
               | tentacleuno wrote:
               | Google seems to be taking much more advantage of
               | YouTube's transcription feature lately. The first
               | addition was the (ok, gimmicky) animation on the
               | Subscribe button when someone says the dreaded like.
               | Hopefully a sign of things to come.
               | 
               | Overall AI summaries are very welcome for a certain
               | subset of YouTube which is sadly dominated by sponsored,
               | clickbait, and ad-driven content.
        
             | dcow wrote:
             | Didn't Google try this already? It seems useful to me, at
             | least. IMO the next frontier of search is not better
             | hypertext, it's podcasts, audio, and video.
        
             | tentacleuno wrote:
             | > Seeking to the relevant section of the video is a
             | solvable problem
             | 
             | ...and it has already _been_ solved, though partially:
             | SponsorBlock allows people to add a  "Highlight" section to
             | a video, which denotes the part of the video which the user
             | most likely wanted to see (sans the "what's up guys", "like
             | and subscribe", etc.)
             | 
             | Of course, it's not perfect: it relies upon humans doing
             | the work, though some may see that as a positive over
             | something more computerized.
        
           | plagiarist wrote:
           | Do you have some tips for finding concise videos that answer
           | the question you are asking? I am finding more and more
           | obvious LLM bullshit in results, so I am willing to try some
           | other tactics. But I am not ready to spend the minutes
           | watching videos to see if it is actually relevant or a waste
           | of time, always artificially long to increase ad revenue.
        
             | crznp wrote:
             | For me, it really depends on the type of video. For fixing
             | cars, I'm usually looking for something specific enough
             | that there isn't a lot of chaff. It was probably recorded
             | and edited on a phone just to splice the clips together.
             | Probably the default thumbnail that youtube extracted from
             | the video.
             | 
             | For product videos, if Project Farm did it, look there
             | first. Otherwise, I look for someone has a lot of videos
             | for competing products with basically the same format, not
             | over 10 minutes.
             | 
             | Tech videos are the hardest, I often still prefer text.
             | Maybe look for links to the docs in the description? I
             | still get duds though.
        
               | williamcotton wrote:
               | I don't know much about fixing cars, but yeah, YouTube is
               | a treasure trove for tacit knowledge.
        
             | ysavir wrote:
             | Wish I did, but here you're at the algorithm's mercy,
             | unfortunately. One possibility is subbing/accruing watch
             | time on channels that you find provide you the right value,
             | so that the algorithm might recommend similar channels on
             | other subject matters.
        
           | imiric wrote:
           | > But making a video takes time and effort, so has a much
           | higher barrier to use as a click farm.
           | 
           | > The mainstream channels aren't all that useful, but there's
           | a ton of "old web" style videos (some even recent)
           | passionately providing details for almost anything you'd
           | think to search. And they're a gold mine.
           | 
           | This won't be the case for long. YT is already starting to be
           | polluted with spam and AI generated content, which will get
           | more and more common. The same thing that happened to the web
           | in text form, will happen to videos.
           | 
           | I think the only solutions are using allowlists for specific
           | domains, and ironically enough more AI to filter specific
           | results. Or just straight up LLMs instead of web search,
           | assuming they're not trained on spam data themselves.
        
             | danieldk wrote:
             | Yeah. I was recently looking for videos comparing two
             | smartphones and among top ranked videos there were videos
             | that just show the phones side by side and the video
             | consists of showing specs side by side and videos that just
             | have LLM-generated text, added to the video with TTS.
        
             | ysavir wrote:
             | One critical difference is the date attached to youtube
             | videos. It's easy to verify that a video was made before
             | this tech was available, but you can't do that with
             | websites, or search engine result pages.
             | 
             | It does limit utility for more modern needs, unfortunately.
        
             | lrem wrote:
             | Note that the problem of filtering bad data out of learning
             | material isn't inherently easier than filtering same out of
             | search results.
        
           | necovek wrote:
           | That's curious, I generally hate video due to inability to
           | glance over content, and the few attempts I made to actually
           | find useful information I searched for resulted in... spammy
           | extra low effort video content that did not answer my
           | questions.
        
             | williamcotton wrote:
             | Depends on what you're looking for. A blog post about how
             | to play Search and Destroy by The Stooges is not as useful
             | as a video of James Williamson himself showing you the
             | riffs!
        
         | teeray wrote:
         | > it just takes too long to get info from videos.
         | 
         | I can't wait until video transcripts get fed into LLMs just to
         | eliminate the whole "This video is sponsored by something-
         | completely-unrelated, more about them later. What's up Youtube,
         | remember to like, share, subscribe... _5 entire minutes pass on
         | similar drivel_ ... _the actual thing you want, but stretched
         | out to an agonizing length_ "
        
           | execat wrote:
           | You need SponsorBlock.
           | 
           | Usually people leave a "highlight" marker which tells you
           | where you're supposed to jump to. Along with the regular
           | "This video was brought to you by <insert>VPN".
        
         | lamontcg wrote:
         | > Though I do sympathize with folks who reminisce about 2008
         | Google Search results, there were probably orders of magnitude
         | less content out there and a complete ignorance to how valuable
         | your place is on your business and thus no SEO.
         | 
         | That was a decade after Google was created and people certainly
         | understood SEO and Google was constantly updating its algorithm
         | to punish people who were trying to game the algorithm.
         | 
         | The wikipedia page on "link farming" for example references it
         | happening as early as 1999 and targeting SEO on inktomi:
         | 
         | https://en.wikipedia.org/wiki/Link_farm
         | 
         | I remember some internal presentations at Amazon around ~2004
         | about how boosting Google SEO on Amazon web pages increased
         | traffic and revenue (and Amazon was honestly a bit behind-the-
         | curve due to a kind of NIH syndrome).
        
           | bee_rider wrote:
           | At the time it seemed like Google was winning, though. SEO
           | seems to have gotten really good, or maybe Google just gave
           | up.
        
         | stevage wrote:
         | I have a hard time believing it's so difficult for a search
         | engine to distinguish between a credible, respected website
         | that has been around a while with some generated garbage that
         | exists to be a search result. We humans can tell them apart, so
         | in principle, computers can too.
        
           | Nextgrid wrote:
           | Yes, this should be table stakes for a classifier - a company
           | with the resource of Google can definitely solve that problem
           | if they weren't themselves in the business of spam
           | (advertising) and benefited from spam sites (as they often
           | include Google ads/analytics).
        
             | navigate8310 wrote:
             | Google is quite quick in plugging holes in AdSense but
             | AdWords.
        
           | pixl97 wrote:
           | I guess this brings up the question of how good are humans at
           | doing this across a wide number of domains on average?
           | 
           | The other question I have is how long do these garbage
           | results stay up for a particular query on average?
        
         | dandrew5 wrote:
         | Google's PSE is neat but there isn't a good way to manage
         | switching between them. They could easily add a little dropdown
         | to let you select which one to use as part of the public link
         | UI they provide for each one individually. Giggle[1] gives me
         | this ability and I run it locally (alongside Kagi) for more
         | specific things to target domain lists I've been building over
         | the years.
         | 
         | 1. https://github.com/dan-lovelace/giggle
        
       | readthenotes1 wrote:
       | I got different results for Google on "ad block".
       | 
       | And changing the query to "ad blocker" like Google suggested
       | raised ublock origin way up in the results
        
       | jeffreyw128 wrote:
       | The issue with traditional search engines is that keyword-first
       | algorithms are extremely gameable.
       | 
       | Try https://search.metaphor.systems - it's fully neural
       | embeddings-based search. No keywords, only an embedding of what
       | the actual content of a webpage is.
       | 
       | So in the mentioned example of searching for Youtube downloaders,
       | with Metaphor you'll get only Youtube downloaders
       | (https://search.metaphor.systems/search?q=This%20is%20the%20b...)
       | 
       | Full disclosure - I work there :p
        
         | charcircuit wrote:
         | >it's fully neural embeddings-based search. No keywords, only
         | an embedding of what the actual content of a webpage is.
         | 
         | What prevents websites from gaming their embedding? Switching
         | to a similarity search doesn't prevent the results from being
         | gamed.
        
         | marcinzm wrote:
         | How is that different from keywords? Embeddings aren't magic,
         | they're just page content. Content is trivial to game since
         | it's controlled by the website owner.
         | 
         | edit: The results are also from my quick QA not that great.
         | Searching for "what is the best mouse to buy" leads to links to
         | buy random mice versus review summaries or online discussions
         | on mice. One of the recommended queries of "Here is a great fun
         | concert in San Francisco" leads to some really bizarre results
         | in non-English languages that have nothing to do with either SF
         | or concerts.
         | 
         | edit2: Also, Google has been using LLMs part of their search
         | since at least 2018 so definitely not just keyword matching
         | there.
        
           | jeffreyw128 wrote:
           | Yup, definitely still gameable but if the model learns what
           | high quality content is like and what high quality webpages
           | there are (which it does), then the only way to game would be
           | to be great :)
           | 
           | For your search - I would recommend turning autoprompt off
           | and searching something like "Here is a great summary of the
           | best computer mice to use:".
           | 
           | Our embeddings model is trained on how links are talked about
           | on the Internet, if that helps with querying. So you have to
           | query like how someone would refer to a link before sharing
           | it
        
             | marcinzm wrote:
             | > Our embeddings model is trained on how links are talked
             | about on the Internet, if that helps with querying. So you
             | have to query like how someone would refer to a link before
             | sharing it
             | 
             | So it's not high quality web pages but web pages that
             | people talk about a lot which is expected since no one has
             | an oracle that says what high quality is. The embeddings
             | are merely a proxy and generalization for "how links are
             | talked about on the Internet." That can be gamed at scale
             | just like every other signal any popular search engine has
             | been based off of.
        
               | jeffreyw128 wrote:
               | That's true. Although should be much harder
        
         | optshun wrote:
         | This is excellent!
         | 
         | Definitely excited to see how it holds up to daily use.
         | 
         | So far it gave me exactly what I wanted at the top for all of
         | my test queries that were well formed.
         | 
         | As for asking "ignorant" questions both your service and the
         | goog failed where phind gave me an actionable starting point
         | (after a prodding follow up question:
         | https://www.phind.com/search?cache=hmul4znpn7y4ei6qa64fosmc )
         | 
         | "max-height like css property for top and left"
         | 
         | Unsure if this sort of thing is even a goal of your project,
         | but you won over a new user.
         | 
         | Wish you and your team all the best.
        
         | croes wrote:
         | Just wait until the content farms adapt
        
         | standardUser wrote:
         | How do you deal with dynamically/contextually generated
         | content? And how about paywalls and login-required content?
        
           | jeffreyw128 wrote:
           | Do our best at getting the right content.
           | 
           | For paywalls/login - we play pretty straight, always obey
           | robots.txt, etc.
        
         | ShadowBanThis01 wrote:
         | So far so good. I'll try using this first from now on, and see
         | how it does. Good luck!
        
         | ec109685 wrote:
         | https://getthatvideo.com/ Is the first result for downloading
         | YouTube videos. Seems super sus (especially since the site
         | doesn't load).
         | 
         | Auto-prompted to: "Here's a helpful website for downloading
         | YouTube videos:"
         | 
         | Also, this result is horrible:
         | 
         | "What does it mean if someone is not covered in nfl football?"
        
         | anonymoushn wrote:
         | The first result vtubego.com is a 144MB downloader app. The
         | page contains "Pricing Plans Lorem ipsum dolor sit amet,
         | placerat verterem luptatum phaedrum vis, impetus mandamus id
         | vix fabulas vim." above its 3 paid plans (there is no free
         | plan).
         | 
         | I haven't installed the downloader app, so I'm not sure if it
         | lets me download youtube videos for free.
         | 
         | The second result "ytder.com" is a redirect to
         | "https://poperblocker.com/edge/" which seems to be a browser
         | extension for Microsoft Edge that protects the user from the
         | Holy See. I'm not using Edge and I'm trying to download a
         | Youtube video.
         | 
         | The third result download-video.net says that it can download
         | videos from a list of sites. Youtube is not in the list, but
         | let's try anyway. If you put
         | "https://www.youtube.com/watch?v=IkYVmtgxebU" into the text box
         | and click "download" you get "500 SyntaxError: Unexpected token
         | '<', ""
         | 
         | At this point I gave up, but please let me know if any of the
         | results work.
        
       | marginalia_nu wrote:
       | While I've made huge improvements to the algo recently, I do
       | think Marginalia Search got a bit lucky with the sample queries,
       | as it is still IMO far more hit and miss than many alternatives,
       | but that also speaks for how hard evaluating search quality is.
       | 
       | Its efficacy is also strongly dependent on understanding that
       | it's a keyword search engine with no semantic understanding.
        
         | bombcar wrote:
         | I notice you completely avoid the question on how a single
         | developer can do so well ;)
         | 
         | I do think that search has gotten much worse but my ability to
         | know the magic words like "ublock origin" instead of "Adblock"
         | and "yt-dlp" instead of "download YouTube" and phrase my search
         | has gotten better.
         | 
         | We've all been doing prompt engineering against the Internet-
         | wide LLM that is the spam houses.
        
           | marginalia_nu wrote:
           | > I notice you completely avoid the question on how a single
           | developer can do so well ;)
           | 
           | As much as I enjoy the notion of somehow being a 10,000X
           | developer, it's probably mostly that modern search is a
           | filtering problem, and MS does filtering fairly well.
        
         | tentacleuno wrote:
         | > [...] but that also speaks for how hard evaluating search
         | quality is.
         | 
         | Would you be able to share some of your personal highlights
         | regarding this?
         | 
         | I've partially kept up-to-date with the DIY, non-corporate
         | search space (YaCY and friends). I'd love to understand a bit
         | more behind the engineering decisions made when creating a
         | search engine; it seems like a very hard problem to solve.
         | 
         | P.S. Marginalia is a very impressive piece of work, overall --
         | I've heard nothing but positive remarks from users on here.
         | I've been meaning to try it for a while, but time constraints
         | have... well, constrained, thus far.
        
           | marginalia_nu wrote:
           | Honestly I understand it well enough that I see it is
           | surprisingly hard, but not enough to have good solutions...
        
           | golol wrote:
           | I just tested Mariginalia and it was completely unable to
           | lead me to a Wikipedia or imdb page when searching for
           | "driver ryan gosling" and variations. It just listed lots of
           | random articles.
        
             | wisemang wrote:
             | That.. is kind of the point of this particular search
             | engine.
             | 
             | > This is an independent DIY search engine that focuses on
             | non-commercial content, and attempts to show you sites you
             | perhaps weren't aware of in favor of the sort of sites you
             | probably already knew existed.
        
               | golol wrote:
               | Well that makes sense, but I wanted to push against the
               | result that the OP seems to take away from their test,
               | which was that Marginalia seems to work well for the
               | common user.
        
               | marginalia_nu wrote:
               | There's also a known bug with Wikipedia in particular, I
               | do index it but the results are never ranked particularly
               | high. I haven't fixed it because I don't want Wikipedia
               | to be the #1 result for every search. Feels like most
               | people are aware of Wikipedia and don't need help finding
               | it.
        
               | lbalazscs wrote:
               | I often do a Google search, and then go directly to the
               | Wikipedia result. My reasoning is that during the initial
               | search, I don't know if there's a Wikipedia page about
               | that topic, and I might need a fallback option.
        
               | treetalker wrote:
               | Thanks for your work!
               | 
               | I have a suggestion for the "About" section at the top of
               | Marginalia's landing page. I think it would read better
               | like this:
               | 
               | > This is an independent DIY search engine that focuses
               | on non-commercial content, and attempts to show you sites
               | you perhaps weren't aware of [instead] of the sort of
               | sites you probably already knew existed.
               | 
               | Showing one thing "in favor of" another seems
               | contradictory in this case.
        
         | ta988 wrote:
         | Just my feedback after trying to finally get to what it is
         | exactly.
         | 
         | I tried to find marginalia on DDG, not on the first page.
         | Google has it after some garbage. If I go to marginalia.nu I
         | get a SSL error. search.marginalia.nu works
         | 
         | If i search on marginalia for duckduckgo there first link is
         | somewhat relevant but is about the app, all the other links are
         | related to DDG but of curious relevance.
         | 
         | If I search for ublacklist mentioned above, I do not see
         | anything directly relevant.
        
           | marginalia_nu wrote:
           | Hmm, what's your browser? I renewed the cert today... Only
           | thing I can think of is that it might not like a wildcard
           | cert for the bare marginalia.nu domain.
        
             | ta988 wrote:
             | Firefox android
        
               | marginalia_nu wrote:
               | Hmm, can't reproduce it myself, but firefox has a nasty
               | habit of quietly "repairing" these types of
               | misconfigurations by redirecting from one subdomain to
               | another. I've added marginalia.nu as a SAN, should
               | hopefully work now.
        
             | jldugger wrote:
             | Safari doesn't like https://marginalia.nu. Probably because
             | *.marginalia.nu is not valid for the base domain. Add it as
             | a Subject Alt Name
        
               | marginalia_nu wrote:
               | Try now?
        
               | jldugger wrote:
               | Looks like you've fixed your bug.
        
       | happytiger wrote:
       | I feel like you could reboot yahoo directory and have more
       | utility that most searches.
        
         | kristofferR wrote:
         | The !bang directory for Kagi is honestly pretty good, found
         | some cool sites there: https://duckduckgo.com/bangs
        
           | louthy wrote:
           | Did you mean to say Kagi or Bing?
           | 
           | Anyway, here's Kagi's bangs:
           | 
           | https://help.kagi.com/kagi/features/bangs.html
        
             | kristofferR wrote:
             | > Note that Kagi supports all DuckDuckGo-style bangs.
             | 
             | You can also make your own bangs.
             | 
             | That said, my point was that the bang directory has a bunch
             | of the most useful sites in each category.
        
         | flenserboy wrote:
         | The return of something like Yahoo Directory would be most
         | welcome. There is great utility in having more than one
         | approach into a data space. That we have been stuck with
         | essentially one way in for over a decade means that there is a
         | great deal out there which would be great to access but which
         | has been rendered invisible.
        
           | marginalia_nu wrote:
           | https://ooh.directory/
        
             | flenserboy wrote:
             | Nice. Thanks!
        
             | FergusArgyll wrote:
             | this is awesome! thanks
        
         | infamia wrote:
         | Categorization sounds like a good job for AI. Yahoo execs, are
         | you paying attention? :)
        
       | arthurcolle wrote:
       | I use serpapi for my hot RAG and the results are fine.
       | 
       | Brave search API is obscenely overpriced. I hope someone is
       | working on Search because Google has become a singularly garbage
       | company. Propping up DEI is sinful enough but just failing to
       | compete is lame. /shrug
        
       | toomim wrote:
       | Do search engines censor political topics these days? If you
       | search "truthsocial" on ddg, the truthsocial.com website is the
       | first hit. But if you search "trump truthsocial", it doesn't give
       | you trump's truthsocial page, and doesn't even give you
       | truthsocial.com within the first few pages of search results.
       | 
       | Since ddg uses bing, does anyone know what is happening here at
       | bing? It looks like google results are similar.
        
         | dpkirchner wrote:
         | I doubt you're seeing censorship. If you search for
         | "truthsocial trump" on ddg, you'll see his profile, for better
         | or worse.
        
           | toomim wrote:
           | Oh, interesting. So it depends on the order of the terms:
           | 
           | - "truthsocial trump" works
           | 
           | - "trump truthsocial" doesn't work
        
         | Springtime wrote:
         | DuckDuckGo (and by extension perhaps Bing, assuming identical
         | upstream results) has some terrible results when trying to
         | filter by all kinds of domains.
         | 
         | There's a power tools review/news site that returns zero hits
         | for the actual domain when searching its name (which is the
         | same as its .com address). While for some domains even
         | searching using the `site:` parameter will give far fewer
         | results when paired with a query than just searching the domain
         | name + query sans the TLD (the router firmware site openwrt.org
         | is among such).
         | 
         | It's a mess and reporting it hasn't any difference ime in the
         | past 3 years. So I'd be reluctant to say irrelevant results are
         | due to censorship unless there was more evidence.
        
         | senderista wrote:
         | I have concluded that Google definitely censors search results
         | relating to the Ukraine war, after vainly searching for
         | articles about documented Ukrainian war crimes (reported in
         | mainstream Western media like NYT/WaPo).
        
           | ARandomerDude wrote:
           | I'm not seeing this. I Googled "war crimes by ukrainian
           | soldiers" and the top link was an Amnesty International
           | Article, "Ukraine: Ukrainian fighting tactics endanger
           | civilians".
           | 
           | https://www.amnesty.org/en/latest/news/2022/08/ukraine-
           | ukrai...
           | 
           | I use Google as little as possible because I don't like
           | surveillance advertising but fair is fair.
        
             | senderista wrote:
             | You're right: I just checked and there are several hits for
             | events that happened over a year ago that I couldn't find
             | at all with Google back then. Shame on me for not checking
             | before I posted. I have no idea what happened but
             | apparently it's now fixed.
        
       | johnfn wrote:
       | I noticed that the author uses ChatGPT3.5 rather than 4, which is
       | a rather large difference. I don't have the knowledge to rerank
       | all questions the author asked, but I will say that a test of
       | ChatGPT 4 leads me directly to youtube-dl, which is better than
       | every other search engine listed.
        
         | taberiand wrote:
         | That was the first thing I checked reading the article.
         | Although the argument would be 3.5 is free - any comparison of
         | systems against ChatGPT that isn't using ChatGPT 4 can be
         | dismissed almost out of hand; there is not much point talking
         | about ChatGPT if it's not using ChatGPT 4 and making proper use
         | of its capabilities.
         | 
         | That is not to say that there aren't valid criticisms of and
         | shortcomings in ChatGPT 4 - just that it's not useful to say
         | ChatGPT when it's referring to 3.5
        
           | bombcar wrote:
           | He gives the full queries - do you have chat 4.0 that you ran
           | run it against?
        
           | Dah00n wrote:
           | >any comparison of systems against ChatGPT that isn't using
           | ChatGPT 4 can be dismissed almost out of hand
           | 
           | Does everyone or even most use ChatGPT 4? The most used
           | version is -of course- by far the most relevant.
        
           | vitaflo wrote:
           | This is silly, most people aren't going to pay for ChatGPT,
           | just like they won't pay for Google or DDG. So using 3.5 in
           | this case is perfectly acceptable when we're talking about
           | free software.
        
             | taberiand wrote:
             | Kagi isn't free, that's on the list
        
         | huytersd wrote:
         | I've come to recognize that any article that uses 3.5 has an
         | agenda.
        
           | airstrike wrote:
           | I also suspect as much, but obviously can't know for sure.
           | IMHO it's intellectually lazy if not dishonest to benchmark
           | against 3.5 and not make that fact clearly known upfront
           | 
           | A better benchmark would have had two entries for ChatGPT,
           | showing both 3.5 and 4 results
        
           | xigoi wrote:
           | The agenda of not wanting to pay for something just to test
           | it out when there is a free version?
        
         | latexr wrote:
         | > I will say that a test of ChatGPT 4 leads me directly to
         | youtube-dl
         | 
         | And yet to other people it starts rambling about how that's
         | wrong and you shouldn't do it and doesn't give a usable answer.
         | 
         | https://news.ycombinator.com/item?id=38822040
         | 
         | I boggles the mind the extent to which people salivate over a
         | system that cannot decide between a correct straight answer,
         | something wrong but plausible, something wrong and impossible,
         | or outright refusing to answer.
        
           | johnfn wrote:
           | That's GPT 3.5. It sounds like you have a bit of an axe to
           | grind with ChatGPT, but if you're going to do so, do try to
           | grind it on the correct version.
        
             | latexr wrote:
             | The comment says it's v4. Since there's no information on
             | the page either way (funny, considering the original
             | complaint), I took them at their word. If you don't believe
             | them, that's up to you.
             | 
             | For what it's worth, I do have access to v4 and it did give
             | me an answer right now. But since I also know even v4 can
             | give you wildly different answers to the same question even
             | if you ask them one right after another, that doesn't prove
             | it either way.
        
       | Osiris wrote:
       | I have recently started using kagi after seeing a recommendation
       | here.
       | 
       | From what I understand, it aggregates results from multiple
       | sources rather than having their own indexer.
       | 
       | The results aren't really any better, but the lack of ads and
       | videos in the results makes for a cleaner experience.
       | 
       | I also haven't yet taken advantage of the extra features to block
       | certain websites from results.
       | 
       | Personally, I pay the $5 mostly in an attempt to support another
       | competitor in the space.
        
         | kristofferR wrote:
         | Kagi is awesome, so much better experience than Google!
         | 
         | Start using bangs, lenses and customized results ASAP, that
         | makes a big difference.
        
           | Zambyte wrote:
           | I actually find myself using bangs way more since I switched
           | to Kagi from DDG. I think it's the AI bangs like !chat and
           | !expert that got me in the habit of using bangs besides !g
           | (which I never actually use anymore).
        
         | Nextgrid wrote:
         | Pretty sure the reason Kagi is better isn't because they use
         | multiple sources, it's just because they can use the presence
         | of ads as a negative ranking signal, something that none of the
         | major public search engines will ever do as it goes against
         | their own business model.
        
       | elcook4000 wrote:
       | I have found appending site:edu remarkably improves google
       | results.
       | 
       | For both the tire question and with respect to a youtube
       | dowloader, the first results were on the nose with the addition
       | of site:edu on Google.
       | 
       | Why this is needed and whether a noncommercial, information rich
       | web portal should exist are questions for another thread.
        
       | fantasybroker wrote:
       | I am not sure what the intention of this post is. In _my_
       | handpicked results Kagi far outperforms Marginalia.
       | 
       | #1 "Gordon ramsey" (misspelled "Gordon Ramsay"). Marginalia shows
       | "The Life I Imagine: are my cheeks red?". Kagi corrects to Gordon
       | Ramsay and shows relevant results.
       | 
       | #2 "Ukraine war". Marginalia shows an article about the Russian
       | Orthodox church and a Substack post about the war. Kagi shows
       | Wikipedia, Al Jazeera, etc up-to-date summaries about the war.
       | 
       | #3 "Dildo". Top post on Marginalia is "Students for Concealed
       | Carry Embraces UT Dildos | Students for Concealed Carry". Top
       | posts on Kagi are Wikipedia (read) and Amazon (buy).
       | 
       | > How is Marginalia, a search engine built by a single person, so
       | good?
       | 
       | Because it's not good?
        
         | BytesAndGears wrote:
         | I had a similar experience when testing Kagi after reading
         | this. The top result for the "wider car tires" query on Kagi
         | was a link to Physics StackExchange with some marginally
         | informative answers [0], which would be easy to expand on in
         | future searches. The second result was Reddit. Then a couple of
         | incorrect/irrelevant pages but they don't look like scams
         | 
         | [0]: https://physics.stackexchange.com/questions/29903/why-do-
         | peo...
         | 
         | Edit: I did just realize that I have StackExchange customized
         | to be up-ranked. So that probably helps. But yeah, I guess this
         | is why I usually get good results, which is something that
         | generally still fails with Google for me.
        
         | hattmall wrote:
         | I don't disagree with your assessment in full, but I don't
         | exactly consider wikipedia and Amazon good results. Like they
         | are big enough that if that's the result I want I can go to
         | them directly. So like they aren't bad or wrong, but I can see
         | the case for excluding them. Should something like Webster's
         | dictionary be a top result?
        
           | fantasybroker wrote:
           | I think for single word queries like that Wikipedia covers
           | more ground than a dictionary. Personal preference, perhaps.
           | If I need a definition I search for "define dildo" (Kagi
           | shows Merriam-Webster, Oxford, etc dictionary entries).
        
             | marginalia_nu wrote:
             | Marginalia supports the old Google syntax, e.g.
             | "define:dildo"
        
               | fantasybroker wrote:
               | Thanks! If you are that "single person" who built
               | Marginalia... hope you are not taking my criticism
               | personally. I am more annoyed by this blog post that uses
               | a few handpicked queries to present generalized long
               | winded conclusions that are completely disproven when
               | using a different set of queries.
        
               | marginalia_nu wrote:
               | Yeah, its me, and to be fair I made a comment to a
               | similar effect myself. Assessing search result quality is
               | very hard, and this is definitely a pretty flattering
               | selection of queries.
        
               | fantasybroker wrote:
               | On the plus side - in addition to Marginalia's own
               | success, you can take partial credit for how good Kagi
               | search results are (IIRC Marginalia's index is one of the
               | sources for Kagi search results). So... thank you for
               | that!
        
           | marginalia_nu wrote:
           | Marginalia Search isn't trying to be a universal knowledge
           | engine, it's just a website finder.
           | 
           | That's bad if you're looking for a simple answer or basic
           | fact, and good if you're looking for a few hours of reading.
        
         | Brian_K_White wrote:
         | It seems to me that the name "marginalia" is not just a random
         | set of syllables. It sounds like it's doing what it says on the
         | tin, which is gooder than not doing what it says on the tin.
         | (distinct from whether what it says on the tin is something you
         | want)
        
       | aworks wrote:
       | The appendix describing the individual search results is both
       | entertaining and scary e.g.
       | 
       | "Two of the top three hits are how to install the extension and
       | the rest of the top hits are how to remove this badware. Many of
       | the removal links are themselves scams that install other
       | badware."
        
       | fgblanch wrote:
       | I would love to see Perplexity.ai in the benchmark. It has
       | completely replaced Google/DDG for information questions for me.
       | I still use DDG when I want to do a navigational query (e.g. find
       | the URL for a blog i partially recall the name).
        
         | rr808 wrote:
         | Me too. I only heard about it this morning and it looks kinda
         | perfect so far.
        
         | larve wrote:
         | While kagi was the product that most brought me joy in 2022,
         | perplexity.ai has been the one for 2023, even though i only
         | recently started using it. It's just been a joy to be able to
         | iteratively discuss most of my searches.
         | 
         | EDIT: here's a search for tire (I don't know anything about
         | tire, so maybe there's much better links out there, but this is
         | pretty much what I was expecting. Not an ad or SEO in sight.)
         | https://www.perplexity.ai/search/tire-3iuI9T6BQUSvu2tAhgsRmA...
        
           | freediver wrote:
           | I am wondering if you can use AI chat exclusively for your
           | search needs? If not, what does the perfect integration looks
           | like?
        
         | lhl wrote:
         | I've been really enjoying Perplexity as well. It's a _much_
         | better Internet /search focused experience than ChatGPT, Bing,
         | or Bard. For anyone interested, until the new year (~20 more
         | hours?) there's a code for 2mo free Pro:
         | https://twitter.com/perplexity_ai/status/1738255102191022359
         | (more file uploads, choose your model including GPT4)
        
       | infamia wrote:
       | Try uBlacklist, it's like uBlock, but for search results.
       | 
       | https://addons.mozilla.org/en-US/firefox/addon/ublacklist/
       | 
       | https://chromewebstore.google.com/detail/ublacklist/pncfbmia...
       | 
       | You can sync the settings and your personal blocklist to either
       | Dropbox or Google Drive. It also has the ability to subscribe to
       | blocklists. Mind, you need to manually turn on search engines and
       | subscribe to lists. The uBlacklist subscriptions setting doesn't
       | have any built-in feeds yet though. :(
       | 
       | edit: THere are some feeds on the uBlacklist site though.
       | https://iorate.github.io/ublacklist/subscriptions
       | 
       | edit edit: Found an even better list of feeds.
       | https://github.com/quenhus/uBlock-Origin-dev-filter#other-fi...
        
         | tentacleuno wrote:
         | uBlacklist is absolutely excellent: I've been using it for a
         | few years now, with absolutely no problems.
         | 
         | Quick tip: turn on the 'Skip the "Block this site" dialog', and
         | disable 'Hide the "Block this site" links' settings -- they
         | make it much quicker to block spam websites (of which there are
         | many on regular search engines).
        
           | skygazer wrote:
           | Just today I was looking for an extension just to block Quora
           | from search results. (Talk about a useless site that seems to
           | uselessly outrank Wikipedia on google lately -- what on earth
           | is Google up to?) I'm thankful I saw your and your parent's
           | post.
        
             | carlhjerpe wrote:
             | When Quora was new I followed some topics, got to read
             | interesting answers to interesting questions, but then some
             | kind of enshittification happened. I've blocked it in Kagi
             | now.
        
         | ic_fly2 wrote:
         | This is amazing, I was maintaining my own custom solution that
         | did this.
        
         | bayindirh wrote:
         | This is a feature of Kagi already. You can promote or blacklist
         | domains in your search results.
        
           | EA-3167 wrote:
           | Kagi is just the best, it feels like Google did before a
           | decade+ of enshittification and ad tech.
        
             | cratermoon wrote:
             | Did anyone notice that Kagi showed as barely better than
             | Google in the article?
        
               | _benj wrote:
               | Yeah, for the the results of kagi are so much better than
               | anything else, that it makes me wonder how objective can
               | one be measuring search results.
               | 
               | I use google in a clients computer and it's just
               | horrible.
               | 
               | But it could also be a factor of the customizations I've
               | made for my kagi. Ban quick a few paywalls sites, always
               | put Wikipedia articles on top, prefer blogs than
               | stackoverflow stuff...
        
             | l8_to_catch_up wrote:
             | I just tried it (free account), and it felt underwhelming,
             | not many search results, or particularly interesting ones,
             | for the image and video stuff I searched.
             | 
             | There was little to no spam, though, but not much to look
             | at either. Maybe it might be useful when searching for
             | stuff that usually has high amount of confusing spam, but
             | otherwise not really useful for me...
        
               | RoyalHenOil wrote:
               | Kagi is still very weak for searching for videos and
               | images. For those, I still use Google.
               | 
               | Kagi really shines when you are doing a standard search,
               | though, which is what most people do most of the time.
        
           | KomoD wrote:
           | But I can't do regexes, wildcards or anything like that as
           | far as I can see, like I can in uBlacklist
           | 
           | And it seems like they also have a 1000 domain limit?
        
         | brobinson wrote:
         | Does this exist for DDG?
        
           | infamia wrote:
           | Yes, it works for most search engines.
        
             | brobinson wrote:
             | The addon you linked (on the Firefox version) only requests
             | permissions on google.* sites so I don't think it will work
             | for DDG. Is there a separate extension, or am I
             | misunderstanding something?
        
         | gzer0 wrote:
         | Appreciate you sharing this; I've been searching for something
         | similar for quite some time.
        
         | KomoD wrote:
         | I use uBlacklist with my own blacklists and Google has been
         | pretty usable, it's great.
        
       | thsksbd wrote:
       | Honestly, if you have to search something remotely technical, try
       | HN's search function with comments enabled.
       | 
       | If the topic has ever come up the discussion and links are likely
       | to be more relevant and better than your avg. wiki article
        
       | ShadowBanThis01 wrote:
       | More incorrect usage of "hallucinated" for simply made-up or
       | inaccurate results.
        
       | ChrisArchitect wrote:
       | Blah blah blah. Could you lay this article out any worse? What
       | are the queries you used to test? I want to try them too. Buried
       | in here somewhere.
       | 
       | Using an adblocker is not expert anything.
       | 
       | That you've defined your own opinion for what some of the results
       | _should_ be blows the thing up.
       | 
       | Searching youtube downloader, many people would be _fine_ with
       | some of the ad covered but totally functional sites that pop up
       | on Google. I use some of them every day for quick conversion
       | tasks. I don 't want any youtube-dl result. The average users
       | don't either.
       | 
       | Download firefox? What's that? All the top links are fine? No
       | one's looking at the 7th listing for a simple query to download a
       | program.
       | 
       | Why do wider tires have better grip? .. what, sites like
       | roadandtrack, prioritytire, reddit, some physics and
       | stackexchange sites aren't good enough? they are.
       | 
       | The Vancouver snow report one also. Lots of major news sites.
       | Some weathernetwork and almanacs. All totally acceptable results
       | for a sort of variable question.
       | 
       | blah blah this is just a hate on for Google and a HN/nerd view of
       | the world that the average user is nowhere near living in.
        
         | SnazzyJeff wrote:
         | > Download firefox? What's that? All the top links are fine? No
         | one's looking at the 7th listing for a simple query to download
         | a program.
         | 
         | They are if the first six results are SEO bullshit. Which is
         | the de-facto state of affairs for Google today: advertising
         | traipsing around as search.
        
           | ChrisArchitect wrote:
           | heh, they're not. They're all variations of mozilla download
           | pages and site posts.
        
         | navjack27 wrote:
         | Completely agree. I personally thought searching "Vancouver
         | snow report" to be extremely strange. Just search zip code or
         | city name and weather. Two words. That's all you need to get
         | results. What the hell is snow report? Do you even think you
         | can trust weather reports 10+ days out?
         | 
         | Whole article is rambling and silly and assuming.
        
         | shaldengeki wrote:
         | For whatever it's worth, I think your comment would be a whole
         | lot more convincing without its first and last lines, which had
         | the effect of making you sound (at least to me) like you're
         | shallowly dismissing the article.
        
         | anonymoushn wrote:
         | Which web site did you use to successfully download a youtube
         | video, and which youtube video did you download?
        
       | boomboomsubban wrote:
       | Can someone tell me why Bing, and thus DDG, has switched to
       | prioritizing local results? I'll search the most inane things,
       | like lyrics to a song, and get results for local businesses
       | containing maybe one word in common.
       | 
       | It's most frustrating with phone numbers. I picked up the habit
       | of searching the random numbers that called me, to try and find
       | out if they were possibly important. I used to get a bunch of
       | spam sites that clearly existed to profit off me making those
       | searches.
       | 
       | Both Google and DDG have removed those spam sites, even though
       | they were useful at times. Google will tell me the number is in
       | some random PDF that contains a few of the digits, then no other
       | results. DDG will say the top result is my local police
       | department, something that freaked me out the first few times.
        
         | teeray wrote:
         | > I'll search the most inane things, like lyrics to a song, and
         | get results for local businesses
         | 
         | Query: "I'm coming out of my cage..."
         | 
         | Result (Ad): "You'll be doing just fine with these amazing
         | year-end closeout prices at Al's Discount Car Barn. Gotta come
         | down--you'll want it all!"
        
           | boomboomsubban wrote:
           | Ads would make sense, but there's no way my local city
           | council is paying Bing and they are the most frequently
           | listed result.
        
           | moffkalast wrote:
           | It was only a list, how did it end up like this?
        
         | Gualdrapo wrote:
         | Maybe it was an attempt to make better their results for local
         | results?
         | 
         | When searching for results from my country in DDG (picking the
         | country in the drop-down below the search box) still returned
         | results from the USA or other countries. Even when searching
         | stuff in the local language. Maybe they tried to fix that
         | because it really sucked, so much I never used it again for
         | searching into local websites.
        
           | boomboomsubban wrote:
           | This is the one area it still ignores my location. I live in
           | a town named after a UK city, there's several bigger towns in
           | the US with the same name. I just searched "McDonalds _city
           | name_. " I got results for the locations at least half the US
           | away from me, as well as Uber Eats GB.
        
         | skygazer wrote:
         | If you're going to search for phone numbers you'll want to
         | ensure you enable verbatim searching under tools on Google, and
         | put the number in quotes, perhaps in "xxx-xxx-xxxx" OR "(xxx)
         | xxx-xxxx" forms. Many of the sites you mention are fake sites
         | with fake contacts just for ad serving, and I've read in some
         | few cases the scammers seeded the spoofed numbers they appear
         | to call from on to the sites they control to see who googles
         | their phone numbers.
        
           | pixl97 wrote:
           | Reverse spoof the numbers of FTC investigators and Google
           | employees?
        
         | berkut wrote:
         | Yeah, I've noticed this as well with DDG recently: even with
         | the localised checkbox disabled it still prioritises them,
         | which often is very frustrating as the results are then almost
         | totally useless.
         | 
         | However, more generally, I've personally found that DDG (and
         | maybe Bing's then?) localised results are just _really_ bad,
         | and have been for the multiple years I 've been using DDG and
         | it's had this feature: I'm in New Zealand, and enabling
         | localised / region-based search still often provides results to
         | pages with TLDs like "co.uk", ".ca" and ".pl" (these latter are
         | really common for content-generated spam in my experience),
         | which I just can't understand...
         | 
         | Unfortunately, I have found that Google's results are usually a
         | lot better in terms of being "location-aware" than DDG, at
         | least when that's what you want...
        
           | n_plus_1_acc wrote:
           | I habe the same experience from Germany. There's the slider
           | but it's not doing mich.
        
           | notnullorvoid wrote:
           | That's a bit surprising that you're seeing spam sites with
           | .ca, those are illegal here and all .ca domains must be
           | registered by someone in Canada.
           | 
           | You can report them: https://ised-isde.canada.ca/site/canada-
           | anti-spam-legislatio...
        
         | alexforster wrote:
         | DDG is just repackaged Bing. Always has been. I remember
         | looking into them when I was ready to job-hop many years ago,
         | and they asked for dedication to their search engine as their
         | foremost requirement for employment. It's the "drop-shipping"
         | equivalent of search engines.
        
           | behnamoh wrote:
           | hope kagi takes ddg place in terms of adoption. never really
           | liked ddg even though i always care about privacy.
        
             | mrweasel wrote:
             | I really don't get that sentiment. Currently Kagi is just
             | as dependent on Google as DuckDuckGo is on Bing. That might
             | only be temporary of course and Kagi does seem to be
             | working on a search engine of their own.
             | 
             | Rather than wanting Kagi to take the place of DuckDuckGo,
             | it would would be better if Kagi could take users from
             | Google, and then when ready, drop Google as a search
             | provider.
        
               | Semaphor wrote:
               | Kagi mixes google, bing, some non-profit small-web SE,
               | and their own index.
        
               | mrweasel wrote:
               | I don't think they use Bing, but yes, Google, Marginalia,
               | Yandex, Brave and others. I still fail to see how that's
               | different to DuckDuckGo, who also run their own crawler.
               | It's really weird that people are almost hating on
               | DuckDuckGo for how they run their search engine, while
               | applauding Kagi, for doing the same, but with a different
               | business model.
        
               | speedgoose wrote:
               | I also assume that Kagi uses some shady residential IPs
               | proxies and similar tricks to scrap Google while DDG has
               | access to the Bing API.
        
               | mrweasel wrote:
               | You can buy access to the Google Search API, which is
               | what I assume Kagi does. Building your product on being
               | able to circumvent some Google restrictions seems like a
               | bad business move, if you can buy the same service for a
               | reasonable price.
        
               | speedgoose wrote:
               | Where can I buy it?
        
               | bigtunacan wrote:
               | https://developers.google.com/custom-search
               | 
               | It's been available for ages. We used it to power the
               | company internal search for a large enterprise I worked
               | at 17 or 18 years ago.
        
               | speedgoose wrote:
               | Yes this isn't an API to make a generic search engine.
        
               | Semaphor wrote:
               | Only if they changed that (which they might have as part
               | of their cost-optimization). They said they mixed bing
               | and google results back then.
        
               | feanaro wrote:
               | Kagi should hire the Marginalia author.
        
               | freediver wrote:
               | We already include Marginalia results in Kagi [1]
               | 
               | https://help.kagi.com/kagi/search-details/search-
               | sources.htm...
        
               | Kiro wrote:
               | DDG used to be the HN darling and you would get downvoted
               | for saying anything negative or even insinuating that
               | they are relying on Bing. Now the spot has been overtaken
               | by Kagi but it looks like it suffers from the same
               | problems. The counterargument that they have their own
               | index as well is the same that was used for DDG, when the
               | reality was that it was only used for widgets and other
               | fluff. Let's see how it plays out for Kagi.
        
         | bpodgursky wrote:
         | I suspect it's a failure to distinguish mobile searches (where
         | people are legitimately looking for a business) from desktop
         | searches.
        
         | dvngnt_ wrote:
         | you can use true person search for numbers
        
         | callalex wrote:
         | I'm confused, you are searching for, specifically, a local
         | phone number and you are upset that the machine interprets that
         | as you looking for a local result? That's what most people
         | expect from a local number search.
         | 
         | Perhaps the incorrect thing is not your internet search
         | results, but actually your phone carrier for lying to you and
         | telling you that a caller has a local number?
        
           | tedunangst wrote:
           | If I search for a ten digit number, it is not helpful to
           | return a local business that shares the last four digits.
        
           | boomboomsubban wrote:
           | The number is local, and occasionally I've searched and found
           | the number was a local clinic or business that had legitimate
           | reason to call me but not leave a message. In those
           | scenarios, close to all ten of the numbers are found on the
           | page.
           | 
           | The top result being my local police department because it
           | shares the same area code and has maybe one other number in
           | common is clearly a bad result. It does this even if the
           | phone carrier isn't lying to me and the caller does have a
           | local number, like the increasingly common political spam
           | calls.
        
         | csnover wrote:
         | Man, thank you for saying this. Stuffing results with
         | geolocated local junk despite explicitly opting out by choosing
         | "All regions" is so frustrating. This wasn't happening a year
         | or two ago. I submit negative feedback about it constantly, but
         | I guess not enough people are doing that for anyone to notice
         | or care.
         | 
         | I've also noticed a significant increase in attempts to stuff
         | news into regular search results. I _really_ do not appreciate
         | being force-fed mental health poison. I don't need it ever, but
         | I _especially_ don't need it when I'm searching for some
         | specific technical thing and then get emotionally sabotaged by
         | some clickbait headline because ... why? Some bullshit KPI? Why
         | are tech companies so obsessed with pushing news into every
         | orifice?
        
           | stavros wrote:
           | Hah, calling the news "mental health poison" is the most
           | accurate thing I've read all day.
        
         | mattigames wrote:
         | In my country (Colombia) Google still has not removed those
         | spam sites that just generate all possible numbers.
        
         | michaelbuckbee wrote:
         | Nearly every local search is a leading indicator of buying
         | intent and, therefore, is worth more money when served as a
         | response instead of an authoritative response.
        
         | bscphil wrote:
         | > Bing, and thus DDG, has switched to prioritizing local
         | results
         | 
         | From what I can tell this is an issue with the Bing API that
         | DDG uses that the DDG folks have been unable to resolve. I've
         | tried many identical queries between DDG and Bing and while
         | Bing does occasionally return incorrect local results, the
         | _completely irrelevant_ local results that appear on almost
         | every DDG search do not seem to happen with Bing itself.
         | 
         | From what I understand, DDG is aware of the issue. I don't know
         | why it isn't more of a priority.
        
           | binarymax wrote:
           | Long time DDG user (>10 years) here, and it's astounding to
           | me that they haven't prioritized making their own independent
           | index to switch off Bing. I would have expected them to do it
           | like 5 years ago, but there's afaik no initiative to do so.
           | It's unfortunate and am now trying other engines like Brave
           | search.
        
             | nsagent wrote:
             | I also occasionally try Brave search when a DDG search
             | fails. Sometimes Brave finds what I want, but I frequently
             | get Captcha (and now proof of work) challenges that are
             | quite annoying. I don't get this with any other search
             | provider (though StartPage would frequently do this a while
             | back). I hope this is just a phase, because I would likely
             | use Brave Search more if not for this issue.
        
       | sagarpatil wrote:
       | Have you tried perplexity.ai? It's like ChatGPT and Google had a
       | baby. Looks very promising and I'm seeing a lot of tech leaders
       | (example Toby of Shopify) moving to it.
        
         | dartharva wrote:
         | Aren't Bing Chat and Kagi FastGPT the same in effect?
        
           | littlecranky67 wrote:
           | No, FastGPT is GPT-2 based. I actually prefer FastGPT because
           | its fast (duh!), and as it gives very concise answers and all
           | the generated response carries footnotes with the link to the
           | source.
        
             | freediver wrote:
             | Just to correct, FastGPT uses claude-instant.
        
       | emmanueloga_ wrote:
       | I will admit that I can't read between lines here and just go
       | ahead an ask: What does "bluesky thought leader" suppose to mean?
       | (1) Any guesses who this may be? Why is he not quoted directly?
       | (btw, the term is used 3 times, presumably to refer to the same
       | person).
       | 
       | 1: my reading is that this is a sarcastic denomination for
       | someone that is supposed to be an innovation thought leader but
       | actually is just defending the broken search landscape status
       | quo.
        
       | 0x38B wrote:
       | Re: Kagi, I heard about it on HN, tried it for 100 searches, then
       | subscribed. When I search for random JS and CSS things, MDN is
       | the first result, and if it isn't, I can downrank whatever spammy
       | site(s) are on top.
       | 
       | ---
       | 
       | I wish I had a local LLM trained to detect clickbait and or low-
       | effort content. I imagine searching YouTube and having all the
       | clickbait collapsed together (just like Kagi condenses
       | listicles), with the remainder being potentially high-quality
       | content. Don't know how feasible this is right now.
        
         | shados wrote:
         | I became a huge fan of Kagi after seeing it on hacker news too.
         | It's amazing how good a search engine can be when it's not full
         | of ads.
        
           | D13Fd wrote:
           | Yeah. At first I primarily used Kagi to move away from Google
           | as a company, hoping for results that were equally good. But
           | Google search actually feels crappy now in comparison.
        
         | freeAgent wrote:
         | Just use the Kagi Summarizer on YouTube videos and you don't
         | have to waste time watching trash. It's a great life hack.
        
           | xigoi wrote:
           | How does that work? Does it scrape the auto-generated
           | captions?
        
         | qudat wrote:
         | Been paying for Kagi for 6+ months and very happy with it. I'm
         | pretty anti subscriptions so that's saying a lot for a service
         | that is otherwise free.
         | 
         | I do have to dump into google for local searches every once in
         | awhile, but otherwise happy with it.
        
       | wolverine876 wrote:
       | Look at the source for that page. Is it hand-coded? (I think it's
       | great.)
        
       | zzleeper wrote:
       | For me the problem is not just that searching on Google is bad,
       | but that sometimes it COMPLETELY hides exactly what I'm looking,
       | for no good reason.
       | 
       | For instance, I wrote an R ggplot2 package called "fedplot"
       | (following the convention of calling the package for the figure
       | style it replicates, as in "bbplot" for BBC-style charts).
       | 
       | Try searching for it on Google: "github" "fedplot" doesn't get
       | you anywhere. Meanwhile, every other search engine gives you
       | exactly what you want if you just type "fedplot". I even tried to
       | add the relevant websites through google's suggested tools, and
       | nothing happened :|
        
         | Brian_K_White wrote:
         | Their black box semantic guesser has been told not to feed the
         | radicalizing conspiracy theorist fires about federal plots.
         | 
         | Who needs to know anything about government owned land anyway?
        
         | Dah00n wrote:
         | Searching for "fedplot" looking for
         | https://github.com/sergiocorreia in the results:
         | 
         | Qwant: Result 1
         | 
         | Bing: Result 1
         | 
         | Google: Result 2
         | 
         | Marginalia: Zero results
         | 
         | ChatGPT 3.5: Some Federal Reserve dot plot nonsense and no
         | useful results.
        
           | marginalia_nu wrote:
           | You're never going to find github results on Marginalia as
           | long as they block 3rd party crawlers :-/
        
             | Dah00n wrote:
             | Well, zero results are better than spam ;-)
        
           | zzleeper wrote:
           | I would say Google has zero results, as it does not find
           | https://github.com/sergiocorreia/fedplot nor
           | https://sergiocorreia.github.io/fedplot/ ; even with the
           | advantage of the latter being manually added to the Google
           | Admin console.
           | 
           | Meanwhile, both Bing and Qwant give me exactly what I want
        
       | viraptor wrote:
       | I really don't agree with some of the expectations around
       | results.
       | 
       | > Download youtube videos
       | 
       | > Ideally, the top hit would be yt-dlp or a thin, graphical,
       | wrapper around yt-dlp. Links to youtube-dl or other less
       | frequently updated projects would also be ok.
       | 
       | That's not what a random person expects. yt-dlp or youtube-dl
       | have no meaning to a normie. The first result is an online
       | downloader and that's what an average person is after. I checked
       | the first result in Kagi and it's a valid youtube downloader.
       | 
       | If you're after a commandline tool, ask for it: "commandline tool
       | download youtube videos" gives youtube-dl as the top result with
       | valid options afterwards:
       | https://kagi.com/search?q=commandline+tool+Download+youtube+...
       | 
       | "Ad blocker" seems to ignore other options exist. Yes, ublock
       | would be preferable for most, but ABP is not "very bad". Kagi
       | mentions ABP at position 1 and ublock at position 8:
       | https://kagi.com/search?q=Ad+blocker&r=au&sh=4VHApDrTEfuxMOt...
       | (But for a query like that, I'd be happy with a wikipedia article
       | about adblockers, because why not?)
       | 
       | I'm not disagreeing that results have been getting worse for
       | years, but... this is a really bad scoring system. It feels like
       | that one very new person jumping on SO posting something like
       | "syntax error: if 1 {" - what are you even asking for? (To be
       | honest, the search engines could also give you the equivalent of
       | "this is a very vague, would you like to specify what you're
       | actually after? here are some suggestions: ...", but that's
       | beyond the scope here.) The search returning not the exact thing
       | you want to see for a super generic query, but returning a valid
       | answer to a question is not "very bad".
        
         | linusg789 wrote:
         | My thoughts exactly.
        
         | anonymoushn wrote:
         | If you try using it, the first result doesn't help you download
         | a youtube video and does try to get you to install malware.
        
       | shutupnerd0000 wrote:
       | Speaking of bad software, anyone getting a huge amount of
       | horizontal scroll on mobile on this blog post? What should I add
       | to my bag of tricks to work around that
        
         | jraph wrote:
         | Reader mode might do the job.
        
         | gniv wrote:
         | I am not (Chrome on iOS).
        
       | nneonneo wrote:
       | Honestly, this is depressing. Back in the day, AltaVista and
       | AskJeeves existed but returned terrible results, and Google
       | showed up to disrupt them all. It seems like we should be on the
       | verge of repeating this cycle.
       | 
       | Maybe LLMs will help, but I can't shake the nagging feeling that
       | the situation will simply get worse with LLMs, not better, due to
       | hallucinations and the apparent "gullibility" of LLMs: I would
       | not be surprised if SEOing an LLM turns out to be easier than
       | SEOing Google.
        
       | jimmytucson wrote:
       | If you wanna know why Google (or any search engine) sucks, just
       | look at how it measures its own search results. Most search
       | companies do this "at scale" according to very specific
       | guidelines, like what the author did here but on steroids. For
       | example, take a look at Google's 168-page instruction manual for
       | search quality raters:
       | 
       | https://static.googleusercontent.com/media/guidelines.raterh...
       | 
       | It talks about figuring out a query's meaning(s), judging the
       | user's intent (were they looking for some specific answer, etc.),
       | evaluating the "quality" of a website, rating the site's
       | usefulness in relation to the query's meaning/intent, etc.
       | 
       | All this is to say, it's not that search companies don't do
       | exactly what the author did here, it's just that they have
       | different standards than the author. And I'd venture their
       | standards match their users' better than the author's, but maybe
       | not or not forever, anyway.
        
         | ec109685 wrote:
         | Why would an average user want blog spam search results?
         | 
         | My hope is as LLM's improve, they can be more discriminating
         | about the results returned.
        
           | jimmytucson wrote:
           | > Why would an average user want blog spam search results?
           | 
           | I didn't say they would :)
           | 
           | In fact, I can't figure out how your comment relates to mine.
           | Are you claiming that Google doesn't factor blog spamminess
           | into its evaluation of search results? If so, that's quickly
           | put to bed by the document I linked, pretty much section 4.6.
           | Excerpt:
           | 
           | > Creating an abundance of content with little effort or
           | originality with no editing or manual curation is often the
           | defining attribute of spammy websites.
           | 
           | You could claim that they fail to capture some essential
           | quality of "blog spamitude" or that they don't weight it
           | heavily enough in their eval but to say they just, like,
           | don't know about blogspam over there, is pretty far fetched
           | IMO.
        
         | whakim wrote:
         | I really don't think that's true. For example, page 29 of your
         | link describes "Lowest Quality Content." Most of the search
         | results that the author rated as spammy or scammy clearly fit
         | these guidelines, which means that either (1) the raters aren't
         | knowledgeable enough about the subject matter to determine that
         | the website they're rating is harmful or misleading; or (2) the
         | raters _are_ rating these sites correctly, but it still isn 't
         | having the desired effect.
        
         | mrweasel wrote:
         | > If you wanna know why Google (or any search engine) sucks
         | 
         | While I obviously don't know it may be related to how Google
         | believes a "normal" person search. I have come to view Google
         | as a product search engine/price comparison site, that's what
         | it's great at. Google can find you the most relevant products
         | for any purchase you may consider, so maybe that's what Google
         | has optimized for. The majority of my searches are related to
         | IT, programming, software and computers in general, but what
         | does "normal" people search for. They search for products,
         | news, opening hours for a store, Google is pretty decent at
         | that, but the money is in the "go buy something". The ads on a
         | product search on Google is always way more accurate than the
         | actual search result.
         | 
         | I think Google has optimized for selling products.
        
       | throwawaaarrgh wrote:
       | Search engines are not designed to give you the information you
       | desire. They are designed to sell ads or metadata. "Result
       | quality" is of no consequence.
       | 
       | If you actually wanted accurate results you wouldn't use a tool
       | that is literally attempting to read your mind like a fortune
       | teller. It is impossible to know what you want just by the word
       | "snow". Jesus Christ engineers are so dumb.
        
       | naet wrote:
       | I think the result grading is too opinionated here.
       | 
       | For example, the first query is "download YouTube videos", for
       | which Google is ranked "terrible" for not showing you a command
       | line open source program. But the literal first result is an ad
       | supported site where I can paste in a YouTube link and download
       | it right from the browser. That seems like exactly what most
       | people would want or to the CLI tool the author is searching for.
       | The author seemed to be looking for sites without ads as what
       | they wanted to see in search results more than search relevance.
       | 
       | Search is a very gamed system with a lot of SEO spam type
       | results, but I think a much better analysis could be done for
       | more meaningful results. Also I recreated some of the searches
       | and got very different results (including ublock origin in the
       | top three responses). Again, a more scientific ranking system
       | could help uncover better data on searches.
        
         | shaldengeki wrote:
         | The author describes that site as such, which seems fair to
         | rate as "terrible":
         | 
         | > Some youtube downloader site. Has lots of assurances that the
         | website and the tool are safe because they've been checked by
         | "Norton SafeWeb". Interacting with the site at all prompts you
         | to install a browser extension and enable notifications. Trying
         | to download any video gives you a full page pop-over for
         | extension installation for something called CyberShield. There
         | appears to be no way to dismiss the popover without clicking on
         | something to try to install it. After going through the links
         | but then choosing not to install CyberShield, no video
         | downloads. Googling "cybershield chrome extension" returns a
         | knowledge card with "Cyber Shield is a browser extension that
         | claims to be a popup blocker but instead displays
         | advertisements in the browser. When installed, this extension
         | will open new tabs in the browser that display advertisements
         | trying to sell software, push fake software updates, and tech
         | support scams.", so CyberShield appears to be badware.
        
           | naet wrote:
           | That's how he described it but I tried it myself and found it
           | perfectly functional to download a video with different
           | options for size / quality. It has ads but not nearly as bad
           | as described.
           | 
           | It's a service that is quasi illegal and explicitly breaks
           | the YouTube terms of service. I think the search engine did a
           | good job surfacing what was searched for, there just aren't
           | going to be any free online YouTube downloaders without
           | advertising.
        
             | anonymoushn wrote:
             | Which web site did you use to successfully download a
             | youtube video? Which youtube video did you download?
        
             | shaldengeki wrote:
             | It'd be useful to know what site you used to verify - but
             | if we're talking about the same site, IMO a website that
             | presents Dan's experience sometimes, and your experience
             | sometimes, is actively harmful.
        
         | j7ake wrote:
         | Yeah, if one typed "YouTube downloaded cli" you the results the
         | author was thinking.
         | 
         | It seems like the author wants to search to read their kind
         | without specifying what kind of YouTube downloaded they want
        
       | virgildotcodes wrote:
       | I really don't understand why anyone writing articles about
       | ChatGPT uses 3.5. It's pretty misleading as to the results you
       | can get out of (the best available version of) ChatGPT.
       | 
       | For comparison, here are all the author's questions posed against
       | GPT4:
       | 
       | https://chat.openai.com/share/ed8695cf-132e-45f3-ad27-600da7...
        
         | refulgentis wrote:
         | It's a bit hard to use for most, either $20/month fixed for a
         | limited # of messages, or you need to be able to reason through
         | how to get an API key, or get another 3rd party service with
         | similar cost & limits.
        
           | simonw wrote:
           | You can use GPT-4 for free via Bing - though I find it a
           | little hard to explain to people how they can do that because
           | I'm never sure what the rules are with regards to creating
           | Microsoft accounts, whether you can use any browser or have
           | to use Edge, what countries it's available in etc.
           | 
           | Actually maybe the recommendation should be to use GPT-4 for
           | free via https://copilot.microsoft.com/ instead now.
           | 
           | (Except I can't tell which version of GPT that's using yet -
           | there was a story on 5th December that said GPT-4 Turbo was
           | "coming soon", not sure when "soon" is though:
           | https://blogs.microsoft.com/blog/2023/12/05/celebrating-
           | the-... )
        
             | vitorgrs wrote:
             | FYI: Balanced doesn't run pure GPT4. Balanced uses a
             | combination of multiple models. Precise and Creative is
             | pure GPT4.
             | 
             | About GPT4 Turbo, to check if you are on Turbo, ctrl+U >
             | ctrl+f > check if "dlgpt4t" exists. If it exists, you are
             | running turbo.
             | 
             | You can also double-check by, well, asking stuff after 2021
             | knowledge cut-off as well ("What are the oscar winners?")
             | with search disabled.
             | 
             | But you'll notice because turbo is much faster on bing (and
             | better too).
        
             | apapapa wrote:
             | But that gpt-4 says it can't code
        
           | airstrike wrote:
           | IMHO TBF the "limited # of messages" is continously
           | increasing, to the point I hardly remember it exists these
           | days
        
         | tedunangst wrote:
         | Why does OpenAI continue to offer chatgpt 3.5 if it's so bad?
        
           | azinman2 wrote:
           | Cheaper and faster.
        
           | hannasanarion wrote:
           | GPT 4 is THIRTY (30) times more expensive.
           | 
           | In the llm-assisted search spaces I'm involved in, a lot of
           | folks are trying to build solutions based on fine tuning and
           | support software surrounding 3.5, which is economical for a
           | massive userbase, using 4 only as a testing judge for quality
           | control.
        
           | antupis wrote:
           | Chatgpt3.5 is good enought if can give context in query.
        
         | latexr wrote:
         | > I really don't understand why anyone writing articles about
         | ChatGPT uses 3.5.
         | 
         | Because that's what most people have access to. It's absolutely
         | worthless to most readers to talk about something they'll never
         | pay for and it's not the job of random third-parties to
         | incentivise others to send money to OpenAI.
         | 
         | What I really don't understand is why anyone gets so hung up
         | about it and blames the writer. If you're bothered by people
         | using 3.5 you should complain to OpenAI, not the people using
         | the service they make freely available.
         | 
         | Anecdotally, I find this excessive fawning about 4 VS 3.5 to be
         | unwarranted.
         | 
         | https://news.ycombinator.com/item?id=38304184
        
           | virgildotcodes wrote:
           | > Because that's what most people have access to.
           | 
           | I'd agree with this rationale if the author clearly
           | communicated their choice of model and the consequences of
           | that choice upfront.
           | 
           | In this post the table of results and the text of the post
           | itself simply reads "ChatGPT" with no mention of 3.5 until
           | the middle of a paragraph of text in the appendix.
           | 
           | > It's absolutely worthless to most readers to talk about
           | something they'll never pay for and it's not the job of
           | random third-parties to incentivise others to send money to
           | OpenAI.
           | 
           | The "worth" is in communicating an accurate representation of
           | the capabilities of the technology being evaluated. If you're
           | using the less capable free version, then make that clear
           | upfront, and there's no problem.
           | 
           | If you were to write an article reviewing any other piece of
           | software that has a much less capable free version available
           | in addition to a paid version, then you would be expected to
           | be clear upfront (not in a single sentence all the way down
           | in the appendix) about which version you're using, and if
           | you're using the free version what its limitations may be. To
           | do otherwise would be misleading.
           | 
           | If you simply say "ChatGPT" it's reasonable to infer that
           | you're evaluating the best possible version of "ChatGPT", not
           | the worst.
           | 
           | Accurate communication is literally the job of the author if
           | they're making money off the article (this one has a Patreon
           | solicitation at the top of the page).
           | 
           | Whether or not "most readers" are ever going to pay for the
           | software is totally orthogonal.
           | 
           | If using GPT4 vs 3.5 would create results so distinct from
           | one another that it would serve to incentivize people to give
           | money to OpenAI, well then that precisely supports the
           | argument that the author's approach is misleading when
           | presenting their results as representative of the
           | capabilities of "ChatGPT".
           | 
           | > What I really don't understand is why anyone gets so hung
           | up about it and blames the writer.
           | 
           | Again, if they're making money off their readers it's their
           | job to provide them with an accurate representation of the
           | tech.
           | 
           | > Anecdotally, I find this excessive fawning about 4 VS 3.5
           | to be unwarranted.
           | https://news.ycombinator.com/item?id=38304184
           | 
           | Did some part of my comment came across as "excessive
           | fawning"? Regardless, if this "excessive fawning" is truly
           | unwarranted, this would again undermine your statement that
           | using GPT4 would "incentivize others to send money to
           | OpenAI".
           | 
           | In regards to your link, I'll highlight what another
           | commenter replied to you. What should ChatGPT say when
           | prompted about various religious beliefs? Should it
           | confidently tell the user that these beliefs are rooted in
           | fantastical nonsense?
           | 
           | It seems in this case you're holding ChatGPT to an arbitrary
           | standard, not to mention one that the majority of humanity,
           | including many of its brightest members, would fail to meet.
        
             | latexr wrote:
             | > I'd agree with this rationale if the author clearly
             | communicated their choice of model and the consequences of
             | that choice upfront. (...) with no mention of 3.5 until the
             | middle of a paragraph of text in the appendix.
             | 
             | You're moving the goalposts. You went from criticising
             | _anyone_ using 3.5 and writing about it to saying it
             | would've been OK if they had mentioned it where _you_ think
             | it's acceptable. It's debatable if the information needed
             | to be more prominent; it is not debatable it is present.
             | 
             | > If you simply say "ChatGPT" it's reasonable to infer that
             | you're evaluating the best possible version of "ChatGPT",
             | not the worst.
             | 
             | Alternatively, it you simply say "ChatGPT" it's reasonable
             | to infer that you're evaluating the version most people
             | have access to and can "play along" with the author.
             | 
             | > If using GPT4 vs 3.5 would create results so distinct
             | from one another that it would serve to incentivize people
             | to give money to OpenAI
             | 
             | Those are your words, not mine. I argued for the exact
             | opposite.
             | 
             | > Again, if they're making money off their readers it's
             | their job to provide them with an accurate representation
             | of the tech.
             | 
             | I agree they should strive to provide accurate information.
             | But I disagree that being paid has anything to do with it,
             | and that their representation of the tech was inaccurate.
             | Incomplete, maybe.
             | 
             | > Regardless, if this "excessive fawning" is truly
             | unwarranted, this would again undermine your statement that
             | using GPT4 would "incentivize others to send money to
             | OpenAI".
             | 
             | Again, I did not argue that, I argued the opposite. What I
             | meant is that even if _you_ believe that to be true, that
             | still doesn't mean random third-parties would have any
             | obligation to do it.
             | 
             | > I'll highlight what another commenter replied to you.
             | 
             | That comment has a reply, by another person, to which I
             | didn't feel the need to add.
             | 
             | > It seems in this case you're holding ChatGPT to an
             | arbitrary standard, not to mention one that the majority of
             | humanity, including many of its brightest members, would
             | fail to meet.
             | 
             | Machines and humans are not the same, not judged the same,
             | don't work the same, are not interpreted the same. Let's
             | please stop pretending there's an equivalence.
             | 
             | Here's a simple example: If someone tells you they can
             | multiply any two numbers in their head and you give them
             | 324543 and 976985, when they reply "317073642855" you'll
             | take out a calculator to confirm. If you had done the
             | calculation first on a computer, you wouldn't turn to the
             | nearest human for them to confirm it in their head.
             | 
             | The problem with ChatGPT being wrong and misleading isn't
             | the information itself, but that people are taking it as
             | correct because that's what they're used to and expect from
             | machines. In addition, you don't know when an answer is
             | bullshit or not. With a human, not only can you catch clues
             | regarding reliability of the information, you learn which
             | human to trust with each information.
             | 
             |  _Everyone's_ standard for ChatGPT, be it absolute
             | omniscience, utter failure, or anything in between, is
             | arbitrary. Comparing it to "the majority of humanity,
             | including many of its brightest members" is certainly not
             | an objective measurable standard.
        
       | xpressvideoz wrote:
       | > However, there's a sizable group of vocal folks who claim that
       | search results are still great.
       | 
       | I think that this very sentence shows the author's bias, because
       | I feel that Google's search results are not just great, but
       | _better_ than what it was 10 years ago.
        
         | realcertify wrote:
         | You must be kidding, Google is becoming worse every day. Still
         | better than useless Bing though.
        
         | computerfriend wrote:
         | Consider yourself part of the sizeable group of vocal folk
         | then.
        
       | innocentoldguy wrote:
       | While I think the article is interesting, I disagree with its
       | results regarding Kagi. I like Kagi and rarely use anything else.
       | Kagi's results are decent and I can blacklist sites like
       | Amazon.com so they never show up in my search results.
        
       | vitorgrs wrote:
       | Weird article. Basically, the author thinks that anything that is
       | not yt-dlp is a bad search result, which is pretty insane.
       | 
       | Like, for me at least, I already know yt-dlp exists. When I
       | search "youtube downloader", it's exactly because I want an
       | online-website page to download youtube videos.
        
         | anonymoushn wrote:
         | The author would probably accept any result that helps them
         | download youtube videos. Did you find any and successfully use
         | it to download a youtube video? Could you provide a link to the
         | one you used?
        
       | csours wrote:
       | Wide tires by Jason of Engineering Explained:
       | https://www.youtube.com/watch?v=kNa2gZNqmT8
       | 
       | Better answer: learn the differential equations in this book:
       | 
       | https://ftp.idu.ac.id/wp-content/uploads/ebook/tdg/TERRAMECH...
        
       | freediver wrote:
       | Current Kagi results for those without an account to compare:
       | 
       | youtube downloader
       | 
       | https://kagi.com/search?q=youtube+downloader&r=us&sh=_szITdy...
       | 
       | ad blocker
       | 
       | https://kagi.com/search?q=Ad+blocker&r=us&sh=-BHzV2ZoCDpmgOu...
       | 
       | download Firefox
       | 
       | https://kagi.com/search?q=Download+Firefox&r=us&sh=zkkmc_EQX...
       | 
       | why do wider tires have better grip?
       | 
       | https://kagi.com/search?q=Why+do+wider+tires+have+better+gri...
       | 
       | why do they keep making cpu transistors smaller?
       | 
       | https://kagi.com/search?q=Why+do+they+keep+making+cpu+transi...
       | 
       | vancouver snow forecast winter 2023
       | 
       | https://kagi.com/search?q=Vancouver+snow+forecast+winter+202...
       | 
       | I agree with the author that there is too much spam on the web. I
       | think Kagi in general does a pretty good job at downranking it
       | (number of ads/trackers is a negative ranking signal on Kagi) but
       | we can always do better. Kagi has special search modes like
       | "Small Web" which virtually eliminates spam.
       | 
       | I welcome such scrutiny from the community. Please continue to
       | keep us honest.
        
         | asah wrote:
         | Kagi gives me websites that require more clicking; Google just
         | gives me reasonable answers and I don't see spam in your
         | examples.
         | 
         | "why do wider tires have better grip?"
         | 
         | Wider tires provide more grip due to a larger contact patch
         | with the road. While it's true that friction is not directly
         | dependent on surface area, a larger contact patch allows for
         | more even weight distribution and better traction, particularly
         | during cornering. This can result in improved handling and
         | stability.
         | 
         | "why do they keep making cpu transistors smaller?"
         | 
         | Smaller transistors can do more calculations without
         | overheating, which makes them more power efficient. It also
         | allows for smaller die sizes, which reduce costs and can
         | increase density, allowing more cores per chip.
         | 
         | "vancouver snow forecast winter 2023"
         | 
         | The forecast for the 2023/2024 season suggests that we can
         | expect another winter marked by ample snowfall and temperatures
         | hovering both slightly above and below the freezing mark. Be
         | prepared ahead of time.
        
           | poyu wrote:
           | My only gripe with these quick answers or excerpts is
           | sometimes it doesn't fully give the context. After it's given
           | me couple wrong results I stop trusting it and always read
           | the source. If I were to read the source might as well click
           | into the website and read the whole thing.
        
           | pbronez wrote:
           | I often default to using Kagi's FastGPT when I want direct
           | information instead of a starting point for web browsing. It
           | combines web searches with LLM summarization to produce an
           | integrated, sourced answer. Kinda like "New Bing" with a
           | better interface. You can trigger this mode by adding "!fast"
           | to your query, going to https://kagi.com/fastgpt, or using
           | their API. Here's what it returned for me on those specific
           | queries: ========== "why do wider tires have better grip?"
           | Wider tires provide better grip due to having a larger
           | contact patch with the road surface. A larger contact patch
           | means the tire can disperse the weight of the vehicle over a
           | wider area, reducing pressure. This allows for more friction
           | between the tire and road, improving traction and cornering
           | ability. However, wider tires do not necessarily provide
           | better grip in wet conditions. [1][2][3][4][5] 1. wheels -
           | Does a wider tire always mean more grip/traction - Motor ...
           | 2. Do Wider Tires Give You More Grip? Not All the Time 3.
           | Learn Me: Wider tires = more Grip? = Faster cornering?|
           | Grassroots ... 4. Wide or narrow tires? Here's how your
           | choice will affect driving ... 5. VIDEO: Do wider tires offer
           | more grip? - BMW M3 and BMW M4 Forum ========== "why do they
           | keep making cpu transistors smaller?" There are a few key
           | reasons why CPU manufacturers keep making transistors
           | smaller: 1. Smaller transistors allow more transistors to be
           | placed on a single chip, enabling increased processing power.
           | More transistors results in more capabilities and
           | performance. [1] 2. Smaller transistors use less power and
           | produce less heat. This allows CPUs to operate faster and
           | with more cores/threads within the same power and thermal
           | envelope. [2] 3. Moore's Law, an observation made by Intel
           | co-founder Gordon Moore in 1965, predicted that the number of
           | transistors on integrated circuits doubles approximately
           | every two years. Following this trend has driven the
           | semiconductor industry to continue shrinking transistors. [3]
           | 4. Smaller transistors can switch on and off faster, enabling
           | increased clock speeds. This directly leads to increased
           | performance. [1] However, we are rapidly approaching the
           | fundamental physical limits of transistor shrinkage.
           | Transistors cannot be smaller than an atom, which is around
           | 0.2 nanometers for silicon. [4] Alternative technologies like
           | carbon nanotube transistors may be needed to continue
           | performance growth beyond this point. [5] 1. Why does
           | computer chip process size have to keep getting smaller ...
           | 2. Why does smaller = less heat? - CPUs, Motherboards, and
           | Memory ... 3. What Is Moore's Law and Is It Still True? 4.
           | Transistors are an essential building block used in almost
           | every ... 5. These Transistor Gates Are Just One Carbon Atom
           | Thick - IEEE ... ========== "vancouver snow forecast winter
           | 2023" Based on the context provided: The winter of 2023-2024
           | in Vancouver is expected to see less snow and milder
           | temperatures than typical winters. Forecasts predict a below
           | average snowfall season with warmer weather
           | [1][2][3][4][5][6]. Long range forecasts through December
           | 2023 for southern British Columbia call for showers along the
           | coast and less precipitation inland [7]. Metro Vancouver's
           | fall and winter is also predicted to be a potentially 'down'
           | snow year [4]. Confidence is high for a mild winter in
           | western Canada with reduced snowfall amounts [5]. However,
           | specific snowfall totals are uncertain given the long lead
           | time [6]. 1. Vancouver winter weather: Less snow, milder
           | temperatures ... 2. 2023-2024 British Columbia Winter
           | Forecast Preview | OpenSnow 3. Snow Prediction Vancouver
           | Winter 2023/24 -- Alblaster Snow ... 4. Metro Vancouver's
           | fall, winter forecast | CityNews Vancouver 5. What will this
           | winter be like? Grab the hot cocoa -- here's your 2023 ... 6.
           | Canada's Winter Forecast: El Nino a critical factor for the
           | season ... 7. 60-Day Extended Weather Forecast for Vancouver,
           | BC | Almanac.com
        
           | algas wrote:
           | That first result re: tires is simply wrong. Wider tires
           | don't have a larger contact patch; the size of the contact
           | patch is determined by the weight of the car and the air
           | pressure in the tires:                   A = W / P
           | 
           | So the reason wider tires improve handling is more complex
           | and subtle. Also, FTA:                   Assuming a baseline
           | of a moderately wide tire for the wheel size.           -
           | Scaling both of these to make both wider than the OEM tire
           | (but still running a setup that fits in the car without
           | serious modifications) generally gives better dry braking and
           | better lap times.           - In wet conditions, wider setups
           | often have better braking distances (though this depends a
           | lot on the specific setup) and better lap times, but also
           | aquaplane at lower speeds.           - Just increasing the
           | wheel width and using the same tire generally gives you
           | better lap times, within reason.           - Just increasing
           | the tire width and leaving wheel width fixed generally
           | results in worse lap times.
           | 
           | A full accounting of the effects of changing tire width
           | should explain all of these effects.
        
       | jeffbee wrote:
       | Pretty biased selection of queries. Article avoids the things
       | that ChatGPT and the others without fresh data can't answer. Look
       | at the trending searches on Google. They are all for fresh info
       | that none of the others can answer. Sports scores. Google
       | probably judges quality weighted by the questions their users
       | actually ask, not this nerd bullshit.
        
         | bluish29 wrote:
         | Isn't any selection of queries would be biased. Even what you
         | are saying is biased, you try to say that Google would be
         | better for cases that it optimize for which is even weirder.
         | That is like saying you want to compare highly optimized code
         | that is using some C libraries vs some native python code.
        
         | Dah00n wrote:
         | How is a youtube downloader biased to fresh results? Seems to
         | cover a pretty broad test.
        
           | jeffbee wrote:
           | It selects a "right answer" that suits a stale index,
           | assuming that there can't have been a right-er answer
           | discovered after ChatGPT's training horizon.
        
       | shaldengeki wrote:
       | What's most shocking to me is how much malware there is in all of
       | this. The fact that Google et al aren't constantly in trouble for
       | directly forwarding unwitting users to malware distributors
       | indicates to me just how far our standards have fallen for a
       | "good" search engine. I feel like we'd be happier with search
       | engines that adhered to "first, do no harm" principles.
        
       | hannasanarion wrote:
       | I'm not able to reproduce the author's bad results in Kagi, at
       | all. What I'm seeing when searching the same terms is fantastic
       | in comparison. I don't know what went wrong there.
       | 
       | In the Youtube Downloader search, NortonSafeWeb is nowhere to be
       | found. I get a couple of legit downloader websites, and some
       | articles from reputable tech newspapers on how to use them or
       | command line tools.
       | 
       | In the Adblock search, ublock Origin is #3, followed by some
       | blogs about ad blocking ethics debates and the bullshit Google
       | has been pulling recently.
       | 
       | In the wider tires grip search, #3 is a physics blog that dives
       | deep into the topic.
       | 
       | In the transistors search, the first reddit link directly answers
       | the question in very similar wording to the hypothetical correct
       | answer spelled out in the rubric. 4/5 of the reddit results are
       | on the correct topic, followed by two SuperUser questinos also on
       | the correct topic, then some linus tech tips and toms hardware
       | articles, also on the correct topic. No Quora questions.
       | 
       | In the vancouver winter snow search, the first several results
       | are from local news papers talking about the anticipated effects
       | of el nino on snowfall, and then a couple of high-quality blogs
       | and weather sites.
       | 
       | Really wondering how Dan got such bad results.
       | 
       | ------
       | 
       | Aside from that, the way that the author expects all the results
       | to return the same kind of thing is just... weird? Like, that's
       | not how search engines are supposed to work. A search that gives
       | you 10 links to fundamentally the same thing is a bad search.
       | Search results should cover a breadth of reasonable guesses for
       | what you should be looking for given a query. If you search for
       | "download firefox", and you scroll past the first 5 download
       | links, then you're probably not actually looking for a download
       | link and a blog post about firefox is not "irrelevant" and
       | shouldn't be points against.
       | 
       | This opinion is even borne out in search engine quality metrics
       | that have been industry-standard for decades, like mean
       | reciprocal rank and distributed cumulative gain. What matters is
       | how far you have to scroll to get to a good result, not what
       | proportion of the first N results are good.
        
         | throwaway0665 wrote:
         | have you customized your results and lowered or raised many
         | domains?
        
         | trb wrote:
         | Same here, I was curious about Kagis low ranking, and couldn't
         | replicate the search results. Also saw ublock Origin on #3,
         | good results for tires, transitors and snow, etc. I've never
         | used any of the Kagi search result weighing features.
         | 
         | Ctrl+F on the page for "System prompt" doesn't show any hits.
         | Given how important those are for ChatGPT (another thought -
         | was the author testing GPT3.5 or 4?) I'm not sure how much
         | weight to put into the ChatGPT results either.
         | 
         | Not sure how much I can take away from this comparison.
        
           | spaceman_2020 wrote:
           | I asked GPT-4 about Youtube Downloader and it rambled on
           | about how downloading videos is against Youtube's TOS and I
           | should buy YouTube premium which has the download feature.
           | 
           | Getting any useful data from GPT-4 about anything even
           | remotely "illegal" is a waste of time.
        
             | UberFly wrote:
             | So it has also become one of the glitterati. That didn't
             | take long.
        
             | okasaki wrote:
             | Works fine IMO
             | 
             | https://chat.openai.com/share/7dfd22d2-975c-4e6d-ba4b-c6b99
             | e...
             | 
             | https://chat.openai.com/share/90fae0dc-f8fd-4603-835c-5f3a5
             | 7...
        
             | Huppie wrote:
             | The author already alludes to the fact that you can
             | probably prompt-engineer around this and indeed, as soon as
             | I added a blurb like "these are my own videos that I own
             | the copyright to" it did suggest a bunch of third-party
             | tools and let me ask it about what third-party tools I
             | could use.
             | 
             | It suggested '4K Video Downloader', 'YTD Video Downloader',
             | 'JDownloader' and 'Clipgrab' at first and when I asked for
             | cli tools it came with 'youtube-dl', 'yt-dlp', and 'ffmpeg'
             | 
             | Those seem pretty reasonable results to me but I'll readily
             | admit I don't know (yet) if 'most users' would ask these
             | follow-up questions.
        
             | Semaphor wrote:
             | With a better prompt, you can get it to list some, but it's
             | very annoying to do so.
             | 
             | Mistral showed that their medium model is far better (yet
             | not good), and the same prompt as in the article gives only
             | one instead of 3 paragraphs of rambling about copyright,
             | and then lists 3 categories of options with examples for
             | each (not good, because ytdl is not one of those listed).
             | 
             | Funnily enough, both mistral and GPT4 apologize profoundly
             | and almost with the same wording when asked "Why did you
             | not mention the very popular, free and open source
             | "youtube-dl" software?" and then mention how/where to get
             | it and how to use it.
        
               | freediver wrote:
               | > Funnily enough, both mistral and GPT4 apologize
               | profoundly and almost with the same wording when asked
               | "Why did you not mention the very popular, free and open
               | source "youtube-dl" software?"
               | 
               | Likely because they were optimized for general
               | population, which would not have a use for command line
               | python utility.
        
               | Semaphor wrote:
               | I'm clear why they didn't include it, I wanted them to
               | tell me why, though. And I thought that both of them
               | apologized in almost the same way, was funny.
        
               | DominikPeters wrote:
               | It's plausible that mistral trained on GPT-4 output and
               | therefore has similar mannerisms.
        
             | alextingle wrote:
             | claude.ai produced pretty reasonable results.
        
         | Scaless wrote:
         | I have a new Kagi account with no custom rankings and I see the
         | same terrible results. Basically the same as what he describes.
         | yt-dlp is not found at all, the 2010 link to youtube-dl, and a
         | bunch of spam sites.
        
         | Semaphor wrote:
         | What region? I get similarly bad results with international
         | (and a quick check with region US also didn't improve things)
         | and uBo at only #5, and ytdl at #12. And I already have github
         | on "raise" and a bunch of domains blocked (not many though)
         | 
         | For the transistor query, it's a very "googly" way of writing a
         | query, when I saw the results I instantly felt like rewriting
         | it and the first try gave much better results with "Why keep
         | cpu transistors getting smaller?". Caveat that the results look
         | better and more topical, I don't know what a good answer would
         | be, also why I didn't evaluate the tires or Vancouver weather
         | (I tried a local search for my cities weather, and while the
         | first result was unreleated, the 2nd was okay)
         | 
         | edit: This whole thread made me finally create a file for
         | documenting bad searches on Kagi. The issue for me is usually
         | that they drop very important search terms from the query and
         | give me unrelated results. But switching to verbatim or "forced
         | terms" also prevents any kind of error correction of the
         | search. This used to be one of my main annoyances with DDG back
         | then, and Kagi did not have that issue during the early days.
        
         | szundi wrote:
         | Kagi is awesome for me too. I just realize using Google
         | somewhere else because of the shit results.
        
         | iansinnott wrote:
         | I'll second the chorus of those curious to hear how you've
         | customized the search engine. I was able to reproduce the
         | lackluster results, and was sadly disappointed. I expected what
         | you seem to have found, that Kagi would outperform.
         | 
         | A specific example: for "ad blocker" the first result was some
         | paid ad blocker and ublock was down the page below the fold.
        
         | the__alchemist wrote:
         | I use Kagi because I'm trying to remove Google from my life,
         | but their text search is worse than Google in my experience,
         | and the image search is abysmal. I'm wondering how long I can
         | keep this up. I already revert to Google for image search, and
         | am finding myself using either Google or ChatGPT over Kagi more
         | and more for text as well.
        
           | freediver wrote:
           | Kagi had a pretty substantial image search update just few
           | days ago [1]. Do you still the issues with it?
           | 
           | [1] https://kagi.com/changelog#2793
        
             | the__alchemist wrote:
             | Good info - will experiment!
             | 
             | It's already performing better on a (n=1) test I tried.
             | 
             | "Talos Principle 2". (Video game sequel) Previously (~5
             | days ago), Google returned various screenshots etc from the
             | game `The Talos Principle 2`. Kagi returned mostly results
             | from `The Talos Principle (1)`. Now the latest Kagi results
             | are a mix, mostly from 2. So, it does look like it fixed
             | this query.
        
       | gniv wrote:
       | Meta: Since the text on the page is so dense, I tried reading it
       | in Chrome's reading mode. Which was fine until the Appendix. All
       | the results are missing, leading to confusion.
        
         | UberFly wrote:
         | I also was overwhelmed by the amount of data. I came back here
         | to find the cliff notes :)
        
       | littlecranky67 wrote:
       | Kagi really shines on topics that are SEO-spammed on other search
       | engines. I.e. when travelling to a touristic city, searching a
       | recipe, or basically any product you want to buy. I actually got
       | "search anxiety" searching these topics, as I know I will have to
       | navigate a lot of SEO spam, content that is artificially blown
       | up, and the core information purposefully hidden somewhere on the
       | page - if any. Plus the multitude of cookie consent banners and
       | newsletter subscription popups on each link...
       | 
       | I've been using Kagi's FastGPT [0] now for these searches, it
       | basically removes all the bullshit and gives verifiable sources
       | for any answers.
       | 
       | [0]: https://kagi.com/fastgpt
        
         | pbronez wrote:
         | Yeah that's my go-to as well. Interestingly, I often find that
         | "Fast" mode results are as good or better than "Expert" mode
         | for simpler tasks.
        
       | londons_explore wrote:
       | I would kinda have liked side by side screenshots so I could see
       | for myself rather than a wall of text
        
       | motoxpro wrote:
       | This makes so much sense why people think search results are bad.
       | Great results for "Download youtube videos" is "Ideally, the top
       | hit would be yt-dlp or a thin, graphical, wrapper around yt-dlp"
       | 
       | Just give me a website where I can plug in the DL link and
       | download it to my hard drive. I don't care what package they are
       | using (I don't worry about malware like I did in the 90s).
       | 99.999% of people are not programming tinkerers.
       | 
       | Just makes me realize how subjective search results are. All of
       | their "Great" results are my "Terrible" results.
        
         | darkwater wrote:
         | Malware or well, the actual viruses, in the '90s were a joke,
         | especially because a computer was an isolated thing. Connected
         | computers were the exception.
        
           | acdha wrote:
           | In the early 90s, yes. By the turn of the century the current
           | industry we see today existed in basic form: malware stole
           | credit cards, compromised PCs were used to send spam as part
           | of botnets, etc. The only major advance was when
           | cryptocurrencies made it much easier to launder money and the
           | professionalism went up accordingly.
        
         | carlosjobim wrote:
         | The first result on Kagi is exactly this, just tried it a
         | moment ago. It processed and downloaded the video extremely
         | fast. Why would any reasonable person prefer youtube-dl?
        
           | Dah00n wrote:
           | It is the same using Google.
        
           | motoxpro wrote:
           | Totally, As the sibling said, it is the same using Google. I
           | am not sure, why anyone would want a programming package to
           | accomplish a task that could be done in < 10 seconds.
           | 
           | But again, I guess that's why search is so hard is because I
           | have to parse that intent from 3 words.
        
           | ufmace wrote:
           | IMO, if you're capable of running yt-dlp, it's far better
           | than any website.
           | 
           | It's pretty simple to run these download tools as website,
           | but it's expensive in terms of bandwidth and tends to attract
           | legal attention. So a lot of websites go up supporting it,
           | but even if they were started with good intentions, they will
           | virtually all eventually add intrusive ads or other types of
           | monetization just to break even. So there's never going to be
           | a reliable website for it. If you're lucky, a search engine
           | will send you to one that's working okay right now, but even
           | odds you'll be fighting through a dozen malware nests.
           | 
           | Meanwhile, yt-dlp just works every time, with only an
           | occasional pip upgrade to keep it up to date.
        
           | anonymoushn wrote:
           | Over here the first result on Kagi is savefrom.net which
           | variously tries to install malware or sell a paid
           | subscription and does not download videos.
        
       | jcmeyrignac wrote:
       | No mention of https://www.qwant.com
        
       | DeathArrow wrote:
       | The thoughts about building a better search engine than Google
       | are interesting.
       | 
       | Unlike the author, I think that building a better search engine
       | than Google is possible. But it's going to be rather expensive.
       | And the only proven way to monetize it is selling ads. Which will
       | degrade the quality of the search results fast. For potential
       | investors, there are probably many better ways to invest money
       | then by building a search engine.
       | 
       | This lets us with only one viable alternative: build it in the
       | open like Wikipedia and source donations from people and from
       | Google competitors like Amazon or Apple.
        
       | hamilyon2 wrote:
       | Is this from desktop? What region?
       | 
       | Ublock origin in the very top result for ios device is simply a
       | bad search result page. Maybe fourth position is tolerable, after
       | three different working ones. Maybe it should be lower, I doubt
       | myself, if my point of view is too elitist.
       | 
       | Yt-dlp is subject to all sorts of takedown requests in different
       | jurisdictions.
        
       | ZeroGravitas wrote:
       | I'm not sure youtube-dl is a good answer unless you're a nerd.
       | 
       | Which is a similar phenomenon to search. If you have sufficient
       | tech skills there's a whole world of freely available software
       | out there to complete your task.
       | 
       | If you're not then you are at the mercy of a range of commercial
       | offerings (some built on the free software) that range from
       | arguably scams to outright scams.
        
       | bambax wrote:
       | I'm in the camp of those who think Google's results are still
       | very good. I admit I use adblock (uBlock Origin) and won't even
       | try to disable it.
       | 
       | I understand the author's point of turning off their ad blocker
       | "to get the non-expert browsing experience" but then they could
       | make a different test with uBlock on for every query and see how
       | it goes.
       | 
       | It's also a bit inconsistent to expect results for downloading
       | videos mentioning _yt-dlp_ while trying to emulate  "the non-
       | expert browsing experience"... Yt-dlp is a command-line Python
       | utility. Talk about non-expert! Most people don't know that
       | videos are files that can be downloaded; of those who do, most
       | don't know about the command line or Python.
       | 
       | Yet when searching for _" how to download youtube videos"_ the
       | first result I get on Google is a link to a service called
       | "savefrom.net", which appears to work well and does not seem to
       | be a scam. This would qualify as "very good" in my book.
       | 
       | When searching for _" how to download youtube videos from the
       | command line"_ the first few results are about youtube-dl,
       | including links to github and superuser. Granted they don't
       | mention yt-dlp, but youtube-dl is a good start.
        
         | gkbrk wrote:
         | When I do a Google search in an Incognito tab for "how to
         | download youtube videos", the first two results I get are the
         | following.
         | 
         | - https://msunduziassociation.online/perfect-online-videos/
         | 
         | - https://gssaction.org/program-all-in-one-media-solutions/
         | 
         | I would certainly put those in the "Terrible" category like the
         | author.
        
           | Dah00n wrote:
           | I get savefrom.net in both Incognito and normal tabs, uBlock
           | or not. I have no idea why you get crap results that are
           | somehow different. uBlock doesn't change google results in
           | Firefox for me at all. It seems you get crap _added_ , not
           | removed.
        
             | gkbrk wrote:
             | I searched with Chrome, perhaps that's the difference.
             | Firefox also blocks some ads out-of-the-box even without
             | uBlock, so maybe it was already blocked.
             | 
             | It could also be related to targeting, like time zone,
             | location, IP address, age group etc.
        
               | Dah00n wrote:
               | I get the same search result in Edge as in Firefox. Can't
               | test in Chrome, but something seems strange.
        
             | anonymoushn wrote:
             | savefrom.net is a crap result.
        
           | cj wrote:
           | My top 2 (incognito) are blog posts from pcmag.com and
           | zdnet.com listing 5 ways to download YT videos. Maybe it's
           | blogspam, but the listed services seem valid at first glance.
           | 
           | savefrom.net is the 5th result (2nd page underneath 5 youtube
           | videos)
           | 
           | Edit: This is from the US. If i had to guess, these are
           | regional differences. What country are you in?
        
             | emmelaich wrote:
             | I got similar to you; I'm in Australia.
        
           | londons_explore wrote:
           | Did you click either of those links?
           | 
           | Both seem to do the job of downloading a youtube link to mp4
           | for free.
        
             | gkbrk wrote:
             | Did _you_ click either of those links? They are not YouTube
             | video downloaders, they just link to another downloader.
             | There is nowhere on those links to even put a YouTube URL.
             | 
             | Are you seriously suggesting that a website with the
             | following "About us" with only a link to another YouTube
             | video downloader is itself a good YouTube video downloader?
             | 
             | > Good Samaritan Support Action is to reawaken the Body of
             | Christ to receiving the extravagant love of The Father, as
             | well as our call to respond to this love by loving God with
             | all of our hearts, souls, strengths, and minds. In order
             | for people's hearts to be linked to the heart of our
             | Heavenly Father, we want to foster and facilitate the
             | establishment of a culture of love in our churches and
             | ministries.
        
               | londons_explore wrote:
               | so, there is one extra click... But for the user, the
               | site does the job and takes an extra 1 second.
               | 
               | Ideal? No. But it does the trick.
        
               | hamasho wrote:
               | Not GP, but navigating to an unrelated scammy site just
               | having a link to the actual site is a terrible and
               | unethical job by Google. Imagine if you search "youtube"
               | and the top result is not YouTube but some scammy site
               | just having a link to YouTube. It's not about click
               | counts, if the youtube downloader has bad UX and requires
               | extra clicks, it's a bit inconvenient but ok.
        
             | tantalor wrote:
             | Those are both garbage/scam sites
        
           | sanderjd wrote:
           | I'm curious: what is the rationale for "in an incognito tab"
           | being part of the test harness?
           | 
           | It seems pretty arbitrary to me to disable one of the key
           | features - in this case personalization - of the software
           | being evaluated.
           | 
           | Or is the evaluation not between "search engines" but rather
           | "search engines without personalization"? If so, then this
           | restriction does make sense. But that is not the evaluation
           | that "normal users" are interested in.
        
             | Majromax wrote:
             | > I'm curious: what is the rationale for "in an incognito
             | tab" being part of the test harness?
             | 
             | It's the closest we can easily get to the 'average user
             | experience'. Someone who has a long account/cookie history
             | with Google has plausibly trained the site to return more
             | relevant results through implicit user-curation of avoiding
             | obvious-to-them SEO-spam on other queries.
             | 
             | If we posit that _every_ user eventually trains Google to
             | avoid SEO spam, then this begs the question of why Google(
             | /Bing) don't eliminate the SEO spam in the first place.
             | 
             | Besides that, it's not obvious why search engine
             | personalization should dramatically change the basic
             | utility of search results. We should expect personalization
             | to mostly address ambiguities: is 'the best way to set up
             | tables' asking about furniture assembly/carpentry or SQL?
             | None of the author's queries for this article supported
             | such ambiguities, and besides that the results returned
             | (see the final appendix) aren't[+] valid answers to a
             | different interpretation of the question.
             | 
             | [+] -- I think I'd quibble about the 'adblock' question,
             | since a reasonable person might still find an adblocker
             | that works but participates in the 'acceptable ads program'
             | to be sufficient.
        
               | Jcowell wrote:
               | > It's the closest we can easily get to the 'average user
               | experience'
               | 
               | You wouldn't be really taking the average here though
               | would you? You would be capture the experience someone
               | might have if they were in incognito, using google for
               | the very first time, or using google on another device
               | for the every first time, but not the "average
               | experience".
        
               | sanderjd wrote:
               | > _It 's the closest we can easily get to the 'average
               | user experience'._
               | 
               |  _Maybe_ it 's the closest we can get (though I doubt
               | it), but it definitely isn't close _enough_ to tell us
               | anything about the  "average user experience".
               | 
               | The average user has been using google for years, without
               | taking any steps to avoid personalization. An incognito
               | session (on a browser / machine / network that is
               | probably fingerprinted...) is pretty much the opposite of
               | that typical usage pattern.
               | 
               | I recognize that just writing a blog post or comment on
               | HN is not a research project so needs to do something
               | quick, but I think it mostly invalidates the experiment.
               | What would get closer would be to devise a few user
               | personas and attempt to search and browse for awhile
               | within those personas before trying the experiment. Or
               | much better yet, put together a focus group comprised of
               | real people within the personas you're interested in, and
               | run the experiment using their real accounts.
               | 
               | > _If we posit that every user eventually trains Google
               | to avoid SEO spam_
               | 
               | I don't think it's that, I think it's that every user
               | trains it to return results more likely to improve the
               | metric of "more likely to click one of the links", and I
               | think that makes it more, not less, likely that they see
               | what most of us here consider to be spam.
               | 
               | But I don't know! Maybe that's not what this experimental
               | setup would show. But it would be a lot more enlightening
               | than a setup using a fresh incognito window, which
               | reflects the usage pattern of a proportion of search
               | queries that is a tiny rounding error above zero.
        
               | SV_BubbleTime wrote:
               | Why are you assuming all users are logged in to google
               | all the time?
        
               | nvm0n2 wrote:
               | Google has billions of user accounts ....
        
               | sanderjd wrote:
               | Because it is objectively the case that the "average
               | user" of the internet has a google cookie in their
               | browser. It doesn't require that they be logged in -
               | though I believe it's likely also the case that the
               | "average user" is indeed logged into a google account -
               | it just requires that they use google search without
               | turning off cookies or specifically blocking google's.
               | Essentially everybody uses google search and essentially
               | nobody cares enough (or would know how) to turn off
               | cookies or block google's cookie.
               | 
               | If this doesn't describe most people you know, you're in
               | a very small bubble. (I'm somewhat in that bubble too,
               | but I still have lots of family and friends who use the
               | internet the normal way.)
        
             | gkbrk wrote:
             | Google gets paid when you click on an ad. It's reasonable
             | to guess you're not going to click on too many scam
             | software ads with your software engineer profile. So
             | naturally you'll be showed less of them.
             | 
             | In this thread we can see people both using incognito tabs
             | seeing different results, it will only become worse to
             | compare if they are using personalized results.
        
         | teleforce wrote:
         | I'm also in the same camp who think search results from Google
         | is very good but ChatGPT based search with RAG is better,
         | granted it's a paid version. The latter however is kind of
         | experimental, personally would love to have another column on
         | ChatGPT with RAG (Bing) and the fact the author ignored RAG is
         | rather strange.
        
           | erybodyknows wrote:
           | For those (like me) wondering what RAG means: "Retrieval
           | Augmented Generation (RAG) represents a groundbreaking
           | approach in information retrieval, where the accuracy of
           | search results directly influences the quality of generated
           | answers. In essence, RAG combines traditional search
           | mechanisms with Large Language Model's ability to understand
           | and generate answers."
           | 
           | (https://www.linkedin.com/pulse/how-we-increased-search-
           | accur....)
        
           | jll29 wrote:
           | The topic of control (in ChatGPT like models) explained:
           | https://arxiv.org/pdf/2311.11701.pdf
        
           | HarHarVeryFunny wrote:
           | If you like Bing (ChatGPT with RAG), then also give
           | perplexity.ai a try - similar concept, but IMO better
           | executed.
           | 
           | https://www.perplexity.ai/
        
         | anonymoushn wrote:
         | cross-posted: Did you try using savefrom.net? You can type
         | "https://www.youtube.com/watch?v=IkYVmtgxebU" into the text box
         | and hit "Download". Then you'll get a new tab that tries to get
         | you to install malware. If you decline to install it, the new
         | tab takes you to the malware's homepage. If you close the tab
         | and go back to the original tab, savefrom.net presents you with
         | an error message saying "The download link not found." and does
         | not help you download the video.
        
           | vagrantJin wrote:
           | savefrom.net used to be good but it seems they've switched
           | their MO. plenty of decent alternatives filled the gap
           | though.
        
             | anonymoushn wrote:
             | Can you name the alternatives, and are they present in the
             | search results?
        
         | beezle wrote:
         | Put me in the camp of google and the rest are horrible for all
         | but very specific/unique technical terms, ie weak neutral
         | currents. Anything that is more "everyday life" is an exercise
         | in futility sorting through trash, often without even the terms
         | you are looking for. And good luck with "verbatim" searches -
         | either ignored or zero results.
        
         | bee_rider wrote:
         | An adblocker is necessary, and IMO a script blocker as well. I
         | feel vaguely like search has gotten worse over time, but it is
         | not a huge problem--usually a good site is on the first page or
         | two, and so I can just go check them out.
         | 
         | But if clicking a site meant I would be under attack, that
         | really increases the stakes, I start to care strongly about the
         | absence of bad sites, not just the existence of a good one.
         | 
         | Other than that, people need to be trained to not download
         | programs from websites in general. I think this has gotten
         | better over time? This is just a human mistake. Maybe Google
         | could suppress sites that link to executables. It must, right?
        
           | pixl97 wrote:
           | It would suppress linking to malware executables, but just
           | general programs I don't see why they would.
        
             | bee_rider wrote:
             | By the time you know enough about a site to download some
             | random executable off it and run it, you know more than
             | enough to just enter the URL, so there's no point to having
             | it show up in search results.
        
         | omoikane wrote:
         | > they could make a different test
         | 
         | The takeaway I got from the article is everyone can make their
         | _own_ test, as opposed to relying on other people 's sentiments
         | and memes about X is bad or Y is good.
         | 
         | Trying to emulate a non-expert experience without workarounds
         | is not the common usage pattern since everyone familiar with
         | their favorite tools have ways to get more value out of them,
         | but this article presents a way of constructing an experiment
         | (this is why I chose these queries, this is how I ranked scams,
         | etc.), and I think people should follow this same spirit to
         | evaluate if they are stuck in a local optimum with their
         | current choice of tools.
        
       | poulpy123 wrote:
       | I'm sorry but the very first request is completely wrong. When
       | people search for a YouTube downloader, they want a website that
       | allows to download a YouTube video, not a command line tool. And
       | the first results given by Google do that. I'm one of the people
       | that think Google search became bad but it's not because of the
       | kind of search
        
         | marginalia_nu wrote:
         | That's the tricky mind-reading aspect about search intent.
         | 
         | Different people have varying expectations as to what they want
         | to find with the same query. I'd definitely want yt-dlp in
         | favor of some website.
        
           | poulpy123 wrote:
           | it's easy: just append command line to the query like you
           | would append android app if you wanted and android app
        
             | marginalia_nu wrote:
             | That is a user POV solution, speaking from the search
             | engine POV.
        
               | bee_rider wrote:
               | Based on your handle, I suspect you have much better
               | insight into this than the rest of us!
               | 
               | But can the search engine mind-read by assuming Windows
               | users don't want to use a command-line utility?
        
               | marginalia_nu wrote:
               | They can based on user tracking and profiling, but that's
               | murky waters I personally don't want to dip into.
        
               | notRobot wrote:
               | I assume you meant to say you _don 't_ want to! :)
        
               | marginalia_nu wrote:
               | Yeah I accidentally a word.
        
         | anonymoushn wrote:
         | They do not do that, have you tried using them?
        
       | yashasolutions wrote:
       | Kagi is great, it's now my daily driver for search. This is after
       | I got tired of DDG, moved to Google (through StartPage), but the
       | spammy result, or just irrelevant... and the fact that sometime
       | they aren't any results even... for the most trivial search. So I
       | switch recently to Kagi, and so far it's been smooth sailing and
       | a real time saver.
        
       | buro9 wrote:
       | mostly my search is now Wikipedia.
       | 
       | I'm probably in a very small group who have the entirety of
       | English wikipedia (without images) on my Android (via Kiwix), and
       | I just search that. 99% of the time that's all I need.
       | 
       | the only exceptions are super current things like weather
       | (Windy), or travel (Navan work travel system gives me enough to
       | just go direct to airlines, hotels, etc), and local (OSM via
       | Organic Maps).
       | 
       | I've almost completely degoogled (not intentionally, but driven
       | gradually by Google becoming crappy incrementally), but didn't
       | really find a single generic replacement as much as I found far
       | better single purpose tools.
       | 
       | I'm reminded of that Craigslist image showing how many startups
       | were each competing against specific parts of Craigslist
       | https://cbi-blog.s3.amazonaws.com/blog/wp-content/uploads/20... ,
       | and this is what it feels like is happening to Google.. they're
       | being beaten in specific areas, but at the same time spam and
       | crap is diluting their core product.
        
       | jimbobthemighty wrote:
       | The Github link is my top result on Google. Clearly a mix of
       | uBlock and Privacy Badger are more powerful than most appreciate.
        
       | jmakov wrote:
       | Using phind most of the time. Would be interesting adding it.
        
       | ic_fly2 wrote:
       | I have a small page that modifies my get requests to google by
       | adding -site:... for a bunch of most annoying content farms for
       | stuff I search often (docs)
        
         | Dah00n wrote:
         | Have you tried uBlacklist?
        
       | amadeuspagel wrote:
       | > Here's a fun experiment to try. Take an open source project
       | such as yt-dlp and try to find it from a very generic term like
       | "youtube downloader". You won't be able to find it because of all
       | of the content farms that try to rank at the top for that term.
       | Even though yt-dlp is probably actually what you want for a tool
       | to download video from YouTube.
       | 
       | Is that true? Do most people want to install a command line tool
       | to download youtube videos?
        
         | Dah00n wrote:
         | No. They want sites like savefrom.net - which is hit number one
         | on Google.
        
           | anonymoushn wrote:
           | Did you try using savefrom.net? You can type
           | "https://www.youtube.com/watch?v=IkYVmtgxebU" into the text
           | box and hit "Download". Then you'll get a new tab that tries
           | to get you to install malware. If you decline to install it,
           | the new tab takes you to the malware's homepage. If you close
           | the tab and go back to the original tab, savefrom.net
           | presents you with an error message saying "The download link
           | not found." and does not help you download the video.
        
             | gkbrk wrote:
             | I tried this. I went to savefrom.net. First thing it does
             | is ask permission to send notifications.
             | 
             | After that there is a popup asking me if I want to continue
             | in the browser or download their app. If I click download,
             | it downloads a file called download_helper_2.3.27.apk.
             | 
             | Instead of downloading their app, if I paste a YouTube
             | link, it tells me I can wait or download their APK to skip
             | waiting. The download link downloads an older version
             | called download_helper_2.3.19.apk.
             | 
             | When I do the process again, instead of the older APK link
             | it gives me a Chrome extension link. But if you look at the
             | instructions you see that it's not a Chrome extension, but
             | a minified userscript. And it has `@include https://*` so
             | it can basically run on any website regardless of clicking
             | on an extension icon like regular browser extensions.
             | 
             | If I try to ignore all the distractions and wait for the
             | download link, I can click it and it downloads the MP4
             | file. But it also opens a popunder with the domain
             | https://refpamjeql.top/.
             | 
             | Not the best experience, and seems like a high risk of
             | getting malware, but it does get an MP4 file at some point.
        
               | anonymoushn wrote:
               | Interesting! I tried again and got completely different
               | results this time. Now there's no malware tab, and
               | instead it tries to get me to pay for a subscription to
               | download high-quality videos or MP3s. If I click the
               | barely-visible "Just let me download in my browser with
               | low quality" below the paid subscription button, I get
               | the same error as before.
               | 
               | Edit: the paid subscription payment flow says I'm
               | actually buying "Televzr Premium Max Subscription for
               | 
               | 1 Month_mp
               | 
               | Televzr helps get wireless access to the media library on
               | the computer from the mobile phone"
               | 
               | So it purports to be something unrelated to downloading
               | youtube videos. I didn't pay 1400 yen for it, so I won't
               | get to find out if it helps me download youtube videos.
        
       | BoostandEthanol wrote:
       | There's something incredibly entertaining to me about even this
       | well researched article struggling to find a reason for why wider
       | tyres have more grip.
       | 
       | As I understand it, this is because tyres are still somewhat of a
       | mystery, and anyone outside of a laboratory really doesn't know
       | shit. The best explanation I can think of is due to tyre load
       | sensitivity. The friction coefficient of rubber decreases with
       | normal force (E.g, a heavily loaded tyre has a lower friction
       | coefficient), which is a pretty well accepted fact, this is one
       | of the methods engineers will use to tune the handling of cars.
       | This means a wider tyre has a lower force per unit area of the
       | contact patch, which means it'll have a higher friction
       | coefficient.
       | 
       | Now that sounds plausible to me, but that's just my best guess
       | explanation.
        
         | InCityDreams wrote:
         | https://www.bicyclerollingresistance.com/
         | 
         | gives good tyre advice (obviously not car tyres, but info is
         | there)
        
       | joshuaissac wrote:
       | For the ad blocker results, the author judges the search engines
       | by how they rank the best result (uBlock Origin), but I think
       | that search results that point to Adblock Plus or AdBlock are
       | good enough. Sure, they do not block all ads, and take money from
       | advertisers to allow through certain types of ads, but they still
       | block ads in general, and 'acceptable ads' can be disabled in the
       | settings. So I would consider these 'good results', rather than
       | 'bad results'/'very bad results' as the author does.
        
       | shp0ngle wrote:
       | I don't understand the praise of Marginalia.
       | 
       | When I search for "Steve Jobs" on Marginalia, I got blogs about
       | his speech in 2011 and some mailing list from 2007.
       | 
       | When I search for my own name I get nothing. In Google it's just
       | me.
       | 
       | It's cool that one person built all this of course but... that's
       | not a good search result compared to Google?
       | 
       | Maybe I miss something, maybe I use it wrong
        
         | marginalia_nu wrote:
         | What do you expect when you search for Steve Jobs? Also, which
         | filter did you use?
        
           | shp0ngle wrote:
           | I don't know I used any filters? I don't know what are
           | filters sorry
           | 
           | I expect wikipedia article on Jobs as a baseline.
        
             | marginalia_nu wrote:
             | Ah, I downrank Wikipedia pretty hard :P
        
           | shp0ngle wrote:
           | By the way please don't take it as if I am taking you down or
           | something
           | 
           | It's amazing what you did, it's just not a Google killer? or
           | at least I don't see it
        
             | marginalia_nu wrote:
             | It's really not supposed to be either. Like it's designed
             | to be the search engine you use when you can't find
             | something elsewhere, so it's largely designed to show you
             | different results than the ones you get on Google and Bing.
             | 
             | In general a lot of the complaints seem to be "I'm not
             | getting what I expect from Google". Well... yeah. That's
             | the point. If someone wants the same results as Google,
             | they should arguably use Google.
        
       | haizhung wrote:
       | What always confuses me about the ,,search has gotten so bad"
       | mentality is that it is often based on anecdotal evidence at
       | best, and anecdotal recollection at worst.
       | 
       | Like, sure, I have the _impression_ that search got worse over
       | the last years, but .. has it really? How could you tell?
       | 
       | And, honestly, this should be a verifiable claim; you can just
       | try the top N search terms from Google trends or whatever and see
       | how they perform. It should be easy to make a benchmark, and yet
       | no one (who complains about this issue) ever bothers to make one.
       | 
       | Dan at least started to provide actual evidence and criteria by
       | which he would score results, but even he only looked at 5
       | examples. Which really is a small sample size to make any general
       | claims.
       | 
       | So I am left to wonder why there are so many posts about the
       | sentiment that search got worse without anyone ever verifying
       | that claim.
        
         | anonymoushn wrote:
         | Probably for the same reason that there are so many more posts
         | about anything that make claims than that explore evidence
         | systematically, especially when the people making the posts
         | stand to gain nothing by spending their time that way.
         | 
         | I encounter claims that "protobuf is faster than json" pretty
         | regularly but it seems like nobody has actually benchmarked
         | this. Typical protobuf decoder benchmarks say that protobuf
         | decodes ~5x slower than json, and I don't think it's ~5x
         | smaller for the same document, but I'm also not dedicating my
         | weekend to convincing other people about this.
        
           | ForkMeOnTinder wrote:
           | The problem with benchmarking that claim is there's no one
           | true "json decoder" that everyone uses. You choose one based
           | on your language -- JSON.stringify if you're using JS,
           | serde_json if you're using Rust, etc.
           | 
           | So what people are actually saying is, a typical protobuf
           | implementation decodes faster than a typical JSON
           | implementation for a typical serialized object -- and that's
           | true in my experience.
           | 
           | Tying this back into the thread topic of search engine
           | results, I googled "protobuf json benchmark" and the first
           | result is this Golang benchmark which seems relevant.
           | https://shijuvar.medium.com/benchmarking-protocol-buffers-
           | js... Results for specific languages like "rust protobuf json
           | benchmark" also look nice and relevant, but I'm not gonna
           | click on all these links to verify.
           | 
           | In my experience programming searches tend to get much better
           | results than other types of searches, so I think the
           | article's claim still holds.
        
             | anonymoushn wrote:
             | I agree. You wouldn't use encoding/json or serde-json if
             | you had to deserialize a lot of json and you cared about
             | latency, throughput, or power costs. A typical protobuf
             | decoder would be better.
        
         | marginalia_nu wrote:
         | I think the point he's trying to make that the search results
         | page from the mainstream search engines are a minefield of
         | scams that a regular person would have difficulty navigating
         | safely.
         | 
         | If he was looking at relevance, yours would be a solid point,
         | but since most of the emphasis is on harm, a smaller sample
         | works. Like "we found used needles in 3 out of 5 playgrounds"
         | doesn't typically garner requests for p-values and error bars.
        
           | sanderjd wrote:
           | I think this is a good illustration of my frustration with
           | this discussion: I don't think search has gotten bad, I think
           | the web has gotten bad. It's weird to even conceptualize it
           | as a big graph of useful hypertext documents. That's just
           | wikipedia. The broader web is this much noisier and dubious
           | thing now.
           | 
           | That's bad for google though! Their model is very much
           | predicated on the web having a lot of signal that they can
           | find within the noise. But if it just ... doesn't actually
           | have much signal, then what?
        
             | whakim wrote:
             | But there's still plenty of signal. It isn't as if there
             | are no working YouTube downloaders, or factually correct
             | explanations of how transistors work. It's just that search
             | engines don't know how to (or don't care enough about)
             | disambiguating these good results from the mountains of
             | spam or malware.
        
               | devinmcafee wrote:
               | I think that both of you are correct. The internet has
               | much more "noise" than in the past (partially due to
               | websites gaming SEO to show up higher in Google's search
               | results). As a result, Google's algorithm returns more
               | "noise" per query now than it used to. It is a less
               | effective filter through the noise.
               | 
               | Imagine Google were like a water filter you install on
               | your kitchen faucet to filter out unwanted chemicals from
               | your drinking water. If as the years progress your
               | municipal tap water starts to contain a higher baseline
               | of unwanted chemicals, and as a result the filter begins
               | to let through more chemicals than it did before, you'd
               | consider your filter pretty cruddy for its use case. At
               | the bare minimum you'd call it outdated. That is what is
               | happening to Google search
        
             | marginalia_nu wrote:
             | On the one hand, I'm not sure the data corroborates that.
             | If this is a web problem and not a search engine problem,
             | then I'd expect every search engine to have the same
             | pattern of scam results.
             | 
             | I'd also argue that finding relevant results among a sea of
             | irrelevant results is the primary function of a search
             | engine. This was as true in 1998 as it is today. In fact,
             | it was Google's "killer feature", unlike Altavista and the
             | likes it showed you far more relevant results.
        
               | gmd63 wrote:
               | If the web is being polluted by a nefarious search engine
               | provider that is excluding the polluted pages from their
               | algorithm, you wouldn't see the same pattern across
               | search engines
               | 
               | Not saying or even suggesting that's happening, but the
               | logic isn't airtight
        
               | marginalia_nu wrote:
               | Well, there's always the Munchaussen trilemma, by which
               | no reasoning is airtight.
        
               | pixl97 wrote:
               | Relevant is a difficult concept to agree on. In 1998 it
               | was more about X != Y, that is being shown legit pages
               | that just were not the correct topic.
               | 
               | These days the results are apt to be the correct topic,
               | but instead optimized for some other metric than what the
               | user wants. For example downloading malware or showing as
               | many crypto ads as possible.
               | 
               | I don't expect every search engine to have the same scam
               | results. Scammers target individual search engines with
               | particular methodologies. Google does a lot of work to
               | prevent crap on their engines, the issue is the scammers
               | in total do far more.
        
             | dpkirchner wrote:
             | The web has gotten bad because of what big search engines
             | have encouraged. If they stopped incentivizing publishing
             | complete garbage (by ruthlessly delisting low quality sites
             | regardless of their ad quantity, etc) then maybe we'd see a
             | resurgence of good content.
        
               | 48864w6ui wrote:
               | The web is bad because it is both popular and commercial.
               | Every now and then I fantasize that just finding a
               | sufficiently user-hostile corner would suffice to
               | recreate the early internet experience of an online world
               | nearly exclusively populated by anticommercial geeks.
        
               | eep_social wrote:
               | I understand this is the tactic the Gemeni folks are
               | using.
        
               | sanderjd wrote:
               | I don't think so. I think it's the inevitable outcome of
               | giving all of humanity the ability to broadcast without
               | curation.
               | 
               | Or maybe we're saying essentially the same thing, but you
               | think search engines should be doing that curation. But
               | that was never my conception of what search engines are
               | for.
        
               | dpkirchner wrote:
               | I think we are indeed saying the same thing. However, I
               | would like search engines to do some curation --
               | specifically, to remove results that deliver malware, are
               | clones of other sites, and are just entirely content free
               | (eg Microsoft's forums).
               | 
               | I'll give Google credit: I haven't seen gitmemory or SO
               | clones in a while. It took a few years but they seem to
               | have dealt with them.
        
           | hyperpape wrote:
           | I agree we can say "this is a minefield of scams" without
           | doing a comparison.
           | 
           | There still is a question about when it got bad--I think Dan
           | mentions 2016 as a point of comparison, and there were plenty
           | of scams back then, so you might wonder whether the days when
           | a query wouldn't return many scams.
           | 
           | If you go back far enough, then there wasn't the same kind of
           | SEO, and Internet scams were much smaller/less organized, but
           | that's a long time ago.
        
             | pixl97 wrote:
             | I think the automation tools for scams are what the major
             | change is. In the distant past it was humans doing this,
             | now I'm guessing there are a few larger businesses and
             | likely nation states that have a point and click interface
             | that removes 99% of the past work.
        
         | avsteele wrote:
         | I don't think this is a fair criticism.
         | 
         | 1) The step where you evaluate "how they perform" is
         | necessarily subjective.
         | 
         | 2) you could design a study and recruit participants but that
         | isn't something a blogger is going to do.
         | 
         | 3) He does link to polls where people agree with the idea the
         | result have gotten worse. Yeah, there are sampling problems
         | with a poll, but its better than nothing.
         | 
         | In this case especially, the writer is answering the question:
         | "Whose results are best according to my tastes?"
        
         | mgdlbp wrote:
         | Internet Archive remembers.
         | https://web.archive.org/web/*/google.com/search/%2A
         | 
         | Find a query of interest, see for yourself (and take a snapshot
         | of the present state for posterity).
         | 
         | The api enables more powerful queries,
         | https://web.archive.org/cdx/search/cdx?url=google.co.jp*&pag...
         | 
         | Also try other search engines and languages.
        
         | jll29 wrote:
         | > Dan at least started to provide actual evidence and criteria
         | by which he would score results, but even he only looked at 5
         | examples. Which really is a small sample size to make any
         | general claims.
         | 
         | US NIST, in their annual TREC evaluation of search systems in
         | the scientific/academic world, use sets of 25 or 50 queries
         | (confusingly called "topics" in the jargon).
         | 
         | For each, a mandated data collection is searched by retired
         | intelligence analysts to find (almost) all relevant result,
         | which are represented by document ID in general search and by a
         | regular expression that matches the relevant answer for
         | question answering (when that was evaluated, 1998-2006).
         | 
         | Such an approach is expensive but has the advantage of being
         | reusable.
        
         | williamcotton wrote:
         | Dan approached the problem from a qualitative perspective.
         | Perhaps if more people took this approach over quantitative
         | maximalism we would actually have products that don't drive us
         | fucking insane.
         | 
         | All that matters _is_ the overwhelming sentiment that search
         | has gotten worse, not the same fucking spreadsheet that got us
         | here in the first place!
        
         | bee_rider wrote:
         | > So I am left to wonder why there are so many posts about the
         | sentiment that search got worse without anyone ever verifying
         | that claim.
         | 
         | I suspect it has gotten worse, so posts complaining about it
         | resonate. But, it is not really a huge problem, and anyway it
         | isn't as if there's much I can do about it, so I'm not going to
         | bother collecting statistically valid data.
         | 
         | I think this is generally true about a lot of things. We should
         | be OK with admitting that we aren't all that data-driven and
         | lots of our beliefs are based on anecdotes bouncing around in
         | conversations. Lots of things are not really very important.
         | And IMO we should better signal that our preferences and
         | opinions aren't facts; far too many people mix up the two from
         | what I've seen.
        
           | pixl97 wrote:
           | When it comes to human psychology what we believe tends to be
           | more important than what is when it comes to future
           | predictions of our actions. If people think search sucks then
           | it's likely they'll use less of it in the future and it opens
           | up companies like Google for disruption.
        
         | narag wrote:
         | _What always confuses me about the ,,search has gotten so bad"
         | mentality is that it is often based on anecdotal evidence at
         | best, and anecdotal recollection at worst._
         | 
         | I can't speak for anybody else, just trying to find stuff
         | online, not writing a treatise about it or writing my own
         | engine to outcompete Google. It's been asked _many_ times here
         | over the years and the answer was always _explanations_ , never
         | solutions.
         | 
         | Shittification does not happen overnight, but along many years.
         | It started with Google deciding that some search terms weren't
         | so popular: "did you mean...?" (forcing a second click to do
         | what you intended to do in the first place) and went downhill
         | when qualifiers to override that crap got ignored.
         | 
         | For me enough was enough when I realized that a simple query
         | with three words, chosen carefully to point to the desired
         | page, gave thousands of results, none of them relevant. YMMV.
        
         | fumeux_fume wrote:
         | So you're confused why other people aren't doing research for
         | you and when they do provide some evidence, you dismiss it
         | because it's not a large-scale scientific inquiry into search
         | quality? Get frickin a grip.
        
         | hn_throwaway_99 wrote:
         | Even without looking at the subjective quality of search
         | results, the sheer user hostility of the design of the Google
         | search results page is an obvious, objective instance of how
         | search has enshittified.
         | 
         | That is, in the early days, Google used to highlight that
         | "search position couldn't be gamed/bought" as one of their
         | primary differentiators, ads were clearly displayed with a
         | distinct yellow background, and there weren't that many ads.
         | Nowadays, when I do any remotely commercial search the entire
         | first page and a half at least on mobile is ads, and the only
         | thing that differentiates ads from organic results is a tiny
         | piece of "Sponsored" text.
        
         | laserbeam wrote:
         | Some things are easily quantifiable, but very few. Such as the
         | number of ads per search. Back in the day google had at most 1
         | and it was visibly distinct from the rest of the links.
         | 
         | Otherwise, yeah, maybe search didn't degrade but the internet
         | got more spammy. Or maybe users just got wiser and can see
         | through the smoke screen better. Who knows...
         | 
         | Doesn't change the fact that today one has to know how to
         | filter through pages of generic results made by low effort
         | content farms. Results that are of dubious validity, which at
         | best simply waste your time. Or through clones of other
         | websites (i.e. Stackoverflow clones).
         | 
         | Search engines can choose to help with that (kagi certainly
         | puts in the effort and I love it for that), or they can ignore
         | the problem and milk you for ad clicks.
         | 
         | Anecdotal evidence is good enough for me.
        
         | ta988 wrote:
         | Yes to get an accurate comparison we would need to have results
         | from queries 10 years ago.
         | 
         | I still remember myself having to really often go to page 3 and
         | more of google searches to find things even really early on.
         | 
         | I think it has never been good, got a bit better before SEO
         | farms took all the gain out. That's my feeling with nothing to
         | back it.
        
         | arp242 wrote:
         | To do this you would need to have a comprehensive definition of
         | "quality", and that's anything but easy, and it will be at
         | least partly subjective. It's also hard to include omissions in
         | your definition of "quality" (and again, what should or should
         | not be omitted is subjective as well).
         | 
         | For example, let's say I search for "Gaza"; on one extreme end
         | some engines might only focus on recent events, whereas others
         | may ignore recent events and includes only general information.
         | Is one higher "quality" than the other? Not really - it depends
         | what you're looking for innit?
         | 
         | All you can really do is make a subjective list of things you
         | find important and rate things according to that, and this is
         | basically just the same thing as an anecdotal account but with
         | extra steps.
        
         | nvm0n2 wrote:
         | _> has it really? How could you tell?_
         | 
         | Yes it has and for a certain class of queries it's not even
         | open for debate, because Google themselves have stated they
         | deliberately made it worse. And they really did, it's very
         | noticeable.
         | 
         | This class of queries is for anything related to any
         | perspective deemed "non authoritative". Try to find information
         | that contradicts the US Government on medical questions, for
         | example, and even when you know what page you're looking for
         | you won't be able to find it except via the most specific
         | forcing e.g. exact quoted substrings.
         | 
         | Likewise, try finding stories that are mostly covered by
         | Breitbart on Google and you won't be able to. They suppress
         | conservative news sites to stop them ranking.
         | 
         | 15 years ago Google wasn't doing that. It would usually return
         | what you were looking for regardless of topic. There are now
         | many topics - which specifically is a secret - on which the
         | result quality is deliberately trashed because they'd prefer to
         | show you the wrong results in an attempt to change your mind
         | about something, than the results you actually asked for.
        
       | torginus wrote:
       | I wonder if this aggregate enshittification of computers (be it
       | search, social media, video games) etc. is actually a good thing
       | for humans in general.
       | 
       | I feel like today's digital spaces don't have as strong a grip on
       | the minds of people - I think folks started rediscovering the
       | value of genunine human interaction and hobbies that do not
       | involve a computer screen.
       | 
       | For example, I haven't seen the equivalent of 2000s-2010s
       | Facebook addicts or (WoW addicts in the gaming space) to such an
       | extent, with parasocial media, such as TikTok or Youtube or
       | Twitch, having replaced social media, and social video gaming
       | such as MMOs having lost a lot of popularity.
        
       | nunez wrote:
       | > When I tried running the query from the paper, "cellular phone"
       | (no quotes) and, the top result was a Google Store link to buy
       | Google's own Pixel 7, with the rest of the top results being
       | various Android phones sold on Amazon.
       | 
       | Interestingly, if you add "before:2001-01-01" to the query, the
       | paper that Brin and Page referenced shows up as the third result.
       | 
       | That this query now ranks phones you can buy higher than
       | information about phones makes sense, since the web is much
       | bigger these days and cell phones are much more widely accessible
       | than they were back then.
       | 
       | > Although Google doesn't publicly provide the ability to see
       | what was historically returned for queries, many people remember
       | when straightforward queries generally returned good results.
       | 
       | See above. Sort of.
       | 
       | ---
       | 
       | I wish Dan spent more time talking about Kagi. I, too, have found
       | it terrible for searching for things to buy and some images but
       | excellent otherwise.
        
       | sundalia wrote:
       | The intro query "youtube downloader" already showed me relevant
       | results (some website where you paste an URL and bam download). I
       | think there's a big tech bias in the whole post (how relevant is
       | a mastodon poll, for real).
       | 
       | Not saying the current landscape doesn't suck with ads everywhere
       | and incentives to not give exactly relevant results at times, but
       | I think google is pretty good still.
        
         | anonymoushn wrote:
         | Which web site did you use to successfully download a youtube
         | video, and which youtube video did you downloadl?
        
       | swayvil wrote:
       | Without labor to run their circus, 99% of business would
       | disappear overnight.
       | 
       | Without business, spam would disappear.
       | 
       | So if you remove the labor you remove the spam.
       | 
       | So the best spam filter is UBI.
        
       | urbandw311er wrote:
       | I had to stop reading this because I found it too depressing and
       | it triggered a lot of anger about how big tech combined with the
       | incentives of capitalism is basically fucking up the world.
        
       | btbuildem wrote:
       | On a side note: would it kill the author of the site to use a
       | stylesheet?
        
         | hasmolo wrote:
         | it's the same as my choice to only use lowercase letters, it is
         | designed to make you upset that i am not following conventions.
         | that's as far as i have been able to figure for hwy i started
         | doing this, and by extension, why tech bois love to drop some
         | vital freature to communication to signal being an 'insider'
        
       | SV_BubbleTime wrote:
       | I'd love to see this a little extended.
       | 
       | Searx and Yandex.
       | 
       | Specifically... if I need something even slightly "gray", Yandex
       | is the only option anymore. Torrent search on google et al is
       | just awful.
        
       | airstrike wrote:
       | > Continuing with the theme of running simple, naive, queries, we
       | used the free version of ChatGPT for this post, which means the
       | queries were run through ChatGPT 3.5.
       | 
       | why
        
       | jbmilgrom wrote:
       | Was gpt4 used (with paying subscription)?
        
       | IceMichael wrote:
       | Okay, so all search engines suck. Yeah, that matches my
       | experience
        
       | justinl33 wrote:
       | > _It 's common to criticize ChatGPT for its hallucinations and,
       | while I don't think that's unfair, as we noted in this 2015, pre-
       | LLM post on AI, I find this general class of criticism to be
       | overrated in that humans and traditional computer systems make
       | the exact same mistakes._
       | 
       | Finally some one said it. We are unnecessarily harsh on
       | hallucinations. LLM's don't intentionally 'lie'. To say this is a
       | wrongful anthropomorphism.
        
         | krapp wrote:
         | It's also a wrongful anthropomorphism to claim that human
         | beings "make the exact same mistakes" as LLMs, because they
         | don't. Humans don't confabulate the way LLMs do unless they
         | have a severe mental illness. A human doctor isn't as likely to
         | simply make up diseases, or symptoms, or medications, whereas
         | an LLM will do so routinely, because they don't understand
         | anything like human anatomy, disease, chemistry or medicine,
         | only the stochastic matching of text tokens.
         | 
         | We're not unnecessarily harsh on hallucinations, it's
         | absolutely necessary because of how effective LLMs are at
         | convincing people that because they can generate language, they
         | are capable of sentient thought, self-awareness and reason.
         | Acting as if humans and LLMs are basically equally trustworthy,
         | or worse, that LLMs are more trustworthy, is _dangerous._ If we
         | accept this as axiomatic, shit will break and people will die.
        
         | maronato wrote:
         | To validate hallucinations is anthropomorphism. Tools shouldn't
         | make stuff up.
         | 
         | I don't second guess Pythons math results. If the result is
         | wrong, that's my fault for coding it wrong, never Pythons for
         | hallucinating
        
       | yetanother12345 wrote:
       | Ironically I had to use a search engine to discover what "Mwmbl"
       | was. It's apparently a search engine. But, visiting the front
       | page, I see something akin to a git commit log?! I'm not sure I'd
       | have guessed that this was a SE if Brave Search did not tell me
       | it was (even then I'm not convinced yet).
       | 
       | https://mwmbl.org/
       | 
       | Added: Interesting. Apparently it's allowed to edit the SERPS
       | there. Which implies that I'm out, but (well, because) I've got a
       | feeling which kind of Internet Entrepreneurs this factoid will
       | appeal to
        
       ___________________________________________________________________
       (page generated 2023-12-31 23:00 UTC)