[HN Gopher] Searx - Privacy-respecting metasearch engine
       ___________________________________________________________________
        
       Searx - Privacy-respecting metasearch engine
        
       Author : cube00
       Score  : 243 points
       Date   : 2021-11-12 12:17 UTC (10 hours ago)
        
 (HTM) web link (sagrista.info)
 (TXT) w3m dump (sagrista.info)
        
       | YXNjaGVyZWdlbgo wrote:
       | I had a searx instance running for a long time and it's great
       | when it works but the plugins for site specific searches break
       | all the time and if you have more than 3-4 users with a high
       | search frequency google blacklists your IP by throttling.
        
         | kaba0 wrote:
         | May I ask about what's your idea of preserving privacy with
         | self-hosting a website that searches for specific terms from a
         | presumably fixed IP, where by 1/3-1/4 chance it can be
         | attributed to you?
        
           | YXNjaGVyZWdlbgo wrote:
           | The instance ran on a mobile connection not associated with
           | any private information.
           | 
           | EDIT: It was located in a german street light 20km away from
           | any of the users. Just to get the geolocation question out of
           | the way.
           | 
           | It was more of an experiment than anything else there will be
           | a talk about it and other FreiFunk (Open Mesh Network in
           | Germany) related stuff at the next virtual CCC congress.
        
             | dalf wrote:
             | Did you renew the mobile IP from time to time?
             | 
             | Note: since the version 1.0, searx stops sending request
             | for 1 day when a CAPTCHA is detected which might help a
             | little.
             | 
             | (I'm really interested by the results of your experiment)
        
               | YXNjaGVyZWdlbgo wrote:
               | We renewed every 12 hours in the end we came to the
               | conclusion that google might discriminate traffic from
               | eastern european states. The sim cards were mostly from
               | poland, belarus and the ukraine. When we tested it
               | against german, french and italian cards captchas were
               | way more frequent on the eastern cards and showed way
               | earlier. I will ping you as soon as the talk is online.
        
       | ahtaarra wrote:
       | I recently made the swtich from DDG from Searx simply bevause
       | right-clicking on a search result to copy the url resulted in a
       | referrer link to be copied rather than the link of the result
       | destination.
        
         | southerntofu wrote:
         | I think enabling DoNotTrack header or disabling JavaScript
         | prevents this behavior: i cannot reproduce on Tor Browser. But
         | you are correct this is a worrying development.
        
         | caskstrength wrote:
         | I use https://addons.mozilla.org/en-US/firefox/addon/clearurls/
         | to automatically convert referrals to actual links on web
         | pages.
        
           | ravenstine wrote:
           | I've been using ClearURLs for a while so I never even
           | realized that DDG used referrer links. Have they always done
           | this?
        
         | boomboomsubban wrote:
         | Isn't the referral link only on the ad results, which is
         | clearly marked "AD"? That's what my quick test shows.
        
       | ojosilva wrote:
       | Are there any opensource real internet search engines worth
       | looking at? I think we should be working on disrupting search as
       | a whole instead of depending on the Googles, Baidus and Bings of
       | the world.
       | 
       | I'm fully aware of the massive crawling and storage requirements,
       | but opensource projects that can get search right can later 1) be
       | hosted by the powerhouses of the cloud or non-profit parties, or
       | 2) become a fully distributed hosting and crawling effort as in
       | p2p and blockchain.
        
         | [deleted]
        
         | vindarel wrote:
         | p2p: there's the Yacy effort. https://yacy.net/ I... couldn't
         | find a portal to try it out (I did years ago, the results
         | need... to be discussed. It's anyways easy to install and to
         | choose what part of the web to crawl.)
         | 
         | > YaCy is free software for your own search engine.
         | 
         | maybe they rebranded and don't aspire to be a complete web
         | search engine?
        
       | goldsteinq wrote:
       | The problem with self-hosted search engine is that they make you
       | _very_ unique: you're the only client of the "backend" engine
       | with that (static and non-NATed) IP. Furthermore, you're now one
       | of the small group of people with "hosting" IPs. Using self-
       | hosted SearX may make you easier to track, not harder.
       | 
       | Using SearX hosted by someone else is marginally better, but now
       | you have to trust the owner of the server, which is probably not
       | what you want for privacy-centered search engine.
        
         | woodruffw wrote:
         | Could you clarify where the privacy concern is here? As I
         | understand it, I'm sharing my IP with search engines anyways;
         | the only difference with a self-hosted SearX instance is that
         | I'm sharing my server's IP instead.
         | 
         | Is the concern that the latter's IP isn't behind a NAT, and
         | therefore is more unique? If so, I think that's the least
         | concerning of the identifying datapoints that a search engine
         | has access to -- my browser metadata is far more identifying.
         | With SearX, that information doesn't get forwarded (IIUC).
        
         | marc_abonce wrote:
         | If you don't want to expose your IP address, you can configure
         | searx to proxy all the queries through Tor. This obviously
         | makes the instance way slower and you'll have to disable some
         | engines that block Tor exit nodes, so it's a trade-off.
        
         | randomsilence wrote:
         | When you click a link on your SearX instance and you don't use
         | referrers, how can anybody track you? Nobody knows that you are
         | coming from your "backend" engine.
         | 
         | You just reveal your search queries to the hosting provider if
         | he maliciously intercepts them.
        
       | drcongo wrote:
       | As someone who uses Safari with its built in list of search
       | providers, I'm rather stuck with DDG for address bar searches,
       | but boy it has really started to suck over the last year or two.
        
         | boomboomsubban wrote:
         | You could always do !searx
        
           | drcongo wrote:
           | True! I'm experimenting with Ecosia right now, but also
           | recently got on the beta for the kagi.com search engine which
           | so far has proven to be vasty superior to any other that I've
           | tried.
        
         | bombadilo wrote:
         | So don't use Safari then? Seems like a pretty simple solution.
        
           | sixothree wrote:
           | Edge is available for Mac.
        
           | drcongo wrote:
           | Chrome destroys the battery on my laptop and is basically
           | spyware these days, the Chrome-alikes are all dreadful -
           | Vivaldi has the jankiest UI ever, Brave is unbearably
           | sluggish, Edge runs processes that I didn't ask for like
           | Microsoft Updater that bugs me constantly and spams the new
           | tab screen with all sorts of low rent junk that I can't
           | remove.
           | 
           | Firefox is my developing browser and I do really like it, but
           | Safari my actual browsing browser because it's by far the
           | best browsing browser on Macs.
        
             | xanaxagoras wrote:
             | I've landed on Librewolf for personal, ungoogled chromium
             | for work. It's great so far, been on this setup for a few
             | months.
        
             | boogies wrote:
             | If you stop using macOS you can get much better frontends
             | for WebKit, from the simple, rather Safari-like GNOME Web
             | (AKA Epiphany) to the powerful Pentadactyl-like luakit.
        
         | ColinHayhurst wrote:
         | Yes, Safari is the worst offender when it comes to offering
         | search choice on desktop.
         | 
         | On iOS using a new app called Hyperweb you can the new Safari
         | extensions to access and create a longer preferences list.
         | https://hyperweb.app/
         | 
         | We really shouldn't have to choose this or that, but should be
         | able to easily use multiple choices in search. You can do that
         | today as explained here, but you'll need to switch browsers.
         | https://blog.mojeek.com/2021/09/multiple-choice-in-search.ht...
        
         | jeroenhd wrote:
         | > with its built in list of search providers
         | 
         | TIL. That's just... terrible UX.
        
       | jqpabc123 wrote:
       | _You do not need to trust third parties to keep you private and
       | not track your every move, which is awesome._
       | 
       | The only way to avoid third parties to run your own server ...
       | but this "metasearch engine" is basically just an aggregation
       | proxy. So every search can still be tracked back to your proxy
       | server by Google, Bing or whoever is providing the actual
       | results.
        
       | ricardo81 wrote:
       | 'Uses Amazon Web Services (AWS) as a cloud provider and Cloudfare
       | CDS.'
       | 
       | IIRC DDG uses Microsoft servers now exclusively. Makes sense
       | given the volume of queries they're handling and all dependent on
       | Bing API.
        
       | AmosLightnin wrote:
       | Privacy is important, but I also care about the terrible quality
       | of search results I get from nearly all the major providers these
       | days. Couldn't an aggregator like SearX host a machine learning
       | layer that learns what results are more likely to be valuable to
       | me, and ranks them higher in the results? Keeping the
       | customization layer on my own server and improving search results
       | would seem to be a big advantage both privacy and performance
       | wise.
        
       | analyte123 wrote:
       | If you're trying to be comprehensive, a few other suggestions in
       | rough order of their usefulness:
       | 
       | Gigablast.com - Has been improved recently. private.sh is
       | supposed to be a private proxy for Gigablast, but it has been
       | broken recently
       | 
       | Exalead.com - run by a French defense contractor for some reason
       | 
       | filepursuit.com - search for files only. Need to play around with
       | it more.
       | 
       | PeteyVid.com - multi-platform video search
       | 
       | Wiby.me - focus on "classic" style web sites
        
       | jefc1111 wrote:
       | I have been trialling Swisscows (https://swisscows.com/) and have
       | found it quite useful. I have not deeply researched their privacy
       | claims, but for now I am just trying to not use Google or
       | mainstream alternatives.
       | 
       | Does anyone else have experience or comments on Swisscows' search
       | engine? Seems like an interesting company all round.
        
       | hans_castorp wrote:
       | Qwant is another (non-US) alternative
        
         | rahen wrote:
         | Doesn't it use Bing results, like DuckDuckGo does? I read it's
         | hosted on MS Azure like Bing and DDG, so in the end it's
         | somewhat just a rebranded Bing. Quite a shame for a European
         | (Franco-German) search engine.
        
           | kaba0 wrote:
           | Even if they were just a proxy behind bing, that would be
           | still good, weren't it?
        
             | rahen wrote:
             | Well, claiming to offer an alternative to the GAFAM while
             | depending on their products / data / infrastructure is a
             | bit misleading.
             | 
             | So indeed they're doing okay privacy wise, but a lot of
             | users feel cheated when they realize their "independent
             | search engine" (DuckDuckGo) is just a Bing portal hosted on
             | Azure.
        
       | citizenpaul wrote:
       | Slightly off topic but has anyone a good solution for removing
       | content farm seach results in this or any engine. For example
       | some worst offenders wikihow , forbes, business i sider.
        
       | ChrisArchitect wrote:
       | Please keep title same as the actual post: Searx - moving away
       | from DuckDuckGo
       | 
       | It gives more context to the topic, as in it's not just a link to
       | the search engine itself.
        
       | hermitsings wrote:
       | I had been considering using Searx (which I had known about
       | before) lately since I have to use DDG+Google for getting
       | satisfactory results.
       | 
       | Edit: I really like DDG bang and vim like nav keys tho
        
         | visiblink wrote:
         | Someone here once posted a link to duckduckstart.com. I have it
         | set up in my search bar now.
         | 
         | If you search, it goes through Startpage (Google results, more
         | privately). If you search with a bang, it goes through
         | Duckduckgo. It's probably close to what you're looking for.
        
       | mg wrote:
       | Every time I stumble across a new search engine, I add it to my
       | search engine comparison tool:
       | 
       | https://www.gnod.com/search/
       | 
       | Will add SearX now. It seems to provide reasonably good results.
       | 
       | Update: It's on (under 'More Engines').
        
         | [deleted]
        
         | Minor49er wrote:
         | You should add Fireball. It's excellent
         | 
         | https://fireball.de/
        
         | nix23 wrote:
         | A yacy instance would be good too ;)
         | 
         | https://yacy.net/
        
         | danskeren wrote:
         | Ask.Moe
         | 
         | nona.de
        
         | 51stpage wrote:
         | There is also Marginalia https://search.marginalia.nu/ which I
         | don't see on your list.
        
           | marginalia_nu wrote:
           | It's currently undergoing its monthly maintenance just FYI.
           | It's up and technically working, but with a drastically
           | reduced index size.
        
           | sysadm1n wrote:
           | Tried Marginalia. So many plaintext http links which I avoid
           | like the plague. That's my only gripe with it. Other than
           | that, it's an awesome tool.
        
             | forgotmypw17 wrote:
             | That's funny, I personally prefer HTTP for its simplicity,
             | human-readability, accessibility, lack of centralized
             | control, backwards compatibility and lack of forced
             | upgrades or locking out old clients, etc., not to mention
             | speed.
             | 
             | Of course, I'm fortunate enough to live in a place where
             | MITM attacks are virtually non-existent, aside from WiFi
             | portals and maybe ISP banners (which I've never
             | experienced.)
        
             | freediver wrote:
             | > So many plaintext http links which I avoid like the
             | plague.
             | 
             | Why? What you described appears to be the safest place on
             | the web.
        
               | sysadm1n wrote:
               | I only browse HTTPS sites. I have the `HTTPS Everywhere`
               | addon installed with the `EASE` / Encrypt All Sites
               | Eligible turned on so I don't accidentally browse an
               | unencrypted website. Something like 85%/90% of the web is
               | encrypted now, and there's _no_ excuse to be using
               | outdated plaintext http anymore. It 's a privacy and
               | security risk. There are only few instances where I had
               | to view a http site (I'm a freelancer and my client's
               | webpage was still unencrypted, so I had to see it, so a
               | rare exception to the rule).
        
               | marginalia_nu wrote:
               | It's funny because I got like 70% HTTP in my index, so
               | the whole "90% of the web is encrypted" seems to depend
               | on which sample you are looking at. Google doesn't index
               | HTTP at all, so that's not a good place to go looking for
               | what's the most popular. That's in fact half the reason
               | why I built this search engine in the first place,
               | because they demand things of websites that some websites
               | simply can't or wont comply with.
               | 
               | A lot of servers still use HTTP, for various reasons.
               | There are also some clients that can't use HTTPS.
        
               | stjohnswarts wrote:
               | I think there are absolute numbers and then there are
               | "the sites most people visit regularly" and those
               | probably are 75% https. It's relative like most things.
        
               | marginalia_nu wrote:
               | Absolute numbers are pretty hard to define, as is the
               | size of the Internet.
               | 
               | If the same server has two domains associated with it,
               | does it count twice? Now consider a loadbalancer that
               | points to virtual servers on the same machine. How about
               | subdomains?
        
               | stjohnswarts wrote:
               | It may be a privacy risk, but it's certainly not a
               | security risk with plain old blog and static sites that
               | have completely open data available to anyone who wants
               | to surf to their sites.
        
               | freediver wrote:
               | The privacy and security risk comes in large part from
               | the nature of code and actions performed on the site.
               | 
               | In reality as far as privacy goes, the matters are on
               | average opposite to your claim. Most sites that will put
               | your privacy at risk today are using https - I am talking
               | about the vast majority of the commercially operated web
               | today. I know my privacy is much better respected on a
               | plain text (no javascript) site using http then on
               | [insert a top 10k most popular site here] using https.
               | 
               | And for security, if I am not performing for example
               | shopping or entering my billing details anywhere on the
               | site, I do not see how a http site can compromise my
               | security.
               | 
               | I actually prefer deploying http sites for simple test
               | projects where speed is imperative because they are also
               | faster - there is no SSL handshake needed to connect.
        
             | marginalia_nu wrote:
             | You should be getting fewer .txt-results in the new update,
             | a part of the problem was that keyword extraction for plain
             | text was kind of not working as intended, so they'd usually
             | crop up as false positives toward the end of any search
             | page. I'm hoping that will work better once the upgrade is
             | finished.
        
         | mattowen_uk wrote:
         | I have this REALLY old text file of search engine URLs:
         | 
         | http://www.jaruzel.com/textfiles/Old%20Web%20Info/Internet%2...
         | 
         | Google basically killed almost off of them off :(
         | 
         | It would be great to see some proper competition in the search
         | space, especially around specialist search engines.
        
           | sixothree wrote:
           | I would love a search engine targeted towards developers.
           | Searching for symbols seem to be a problem with google, not
           | to mention all of the utterly crappy results they serve up.
        
             | ijr wrote:
             | Symbolhound does that.
        
               | stagas wrote:
               | Also grep.app for searching into repos really fast.
        
         | jwithington wrote:
         | where's you.com lol
        
           | mg wrote:
           | Thanks, added.
        
         | 1_player wrote:
         | Is anything supposed to happen when I enter something and press
         | Enter? Nothing happens for me, FF on Windows, uBlock.
        
           | reayn wrote:
           | You are supposed to type something into the search field then
           | click the engine you want to use, it will pass on whatever
           | you entered.
           | 
           | I agree that the creator could make that a little more clear
           | somewhere on the page.
        
         | ColinHayhurst wrote:
         | Nice work. If you have a twitter handle you might request to
         | get added to these lists; either way they might be useful for
         | you: https://twitter.com/SearchEngineMap/lists
        
         | imglorp wrote:
         | This is a nice compilation.
         | 
         | It would be very interesting if it examined and compared
         | results.
        
           | m-i-l wrote:
           | Yes, looks good, although I thought it was going to be a
           | federated search, i.e. you enter your search term and it
           | performs that search on all the sites selected. The simpler
           | way of implementing a federated search would be to show
           | separate results boxes from each site, although that wouldn't
           | scale well to a large number of sites, and it can get quite
           | complicated to try to combine the results.
        
         | wenbin wrote:
         | You should add the podcast search engine
         | https://www.listennotes.com/
        
           | autoexec wrote:
           | Too bad they force you to log in to view a result or do
           | anything but search. They also share/sell your data to 3rd
           | parties including Google
        
           | mg wrote:
           | Thanks, added.
           | 
           | Holy Moly, are there really over 100 million podcast episodes
           | out there?
        
             | wenbin wrote:
             | Yes. some numbers: https://www.listennotes.com/podcast-
             | stats/
             | 
             | Listen Notes was started in early 2017 as a side project,
             | when there were ~23 million episodes.
             | 
             | I remembered seeing the number of web pages indexed by
             | Google in early 1998 was ~25 million, then I thought that
             | "ok, 23 million episodes might justify the existence of a
             | podcast search engine" :)
        
         | fsflover wrote:
         | You should also add YaCy: https://yacy.net.
        
           | mg wrote:
           | It seems to be not web based?
           | 
           | When I click on "Try out the YaCy Demo Peer", I get "502 Bad
           | Gateway".
        
             | fsflover wrote:
             | It's self-hosted and peer-to-peer. You could search for
             | other public-facing instances, e.g.,
             | http://sokrates.homeunix.net:6060. Ideally, you could run
             | your own instance to show the world how it works.
        
       | xwdv wrote:
       | People don't want privacy. They want results.
       | 
       | Society programs us to think privacy is our top concern. Is it?
        
         | marginalia_nu wrote:
         | I do think you are mostly correct. Some people really care
         | about privacy, but for most people it isn't a huge concern.
         | 
         | This doesn't make it any less important, but just means that if
         | your main selling point is "we're the search engine that cares
         | about privacy", then odds are you're not going to get a lot of
         | users.
        
           | lardolan wrote:
           | I agree with most of your points. Although It may be matter
           | of a time to grow that niche. IMO Pople who value it are
           | willing to take sacrifice of less functional outputs.
           | 
           | Privacy is most effective selling point when working with
           | sensitive information.
        
             | marginalia_nu wrote:
             | If you are working with sensitive information, the last
             | thing you probably want to do is broadcast that you are
             | working with sensitive information.
        
               | kreeben wrote:
               | Au contraire, it's the first thing you should broadcast,
               | unless you're trying to scam people out of their PII.
        
         | [deleted]
        
       | GhettoComputers wrote:
       | Isn't hosting your own instance taking away every benefit of
       | searx by revealing your IP? If you had a VPN you'd use it to mask
       | your IP from tracking of search engines anyway, and if you used
       | Tor for it, you'd probably move back soon since it'll be so much
       | worst with latency, like how many people go back to google
       | because DDG sucks for results. I suggest just using instances you
       | can find them at https://searx.space some are more reliable than
       | others but none have been trouble free. There's a lot of these
       | instances for privacy, chrome has a privacy plugin with a white
       | eye that uses nitter.net for Twitter, teddit for Reddit and other
       | public instances. One instance of Reddit was even made completely
       | in rust. ;)
       | 
       | https://chrome.google.com/webstore/detail/privacy-redirect/p...
       | 
       | To reply to the person under me, you're always relying on a trust
       | in something unverified and untrustworthy filters like VPNs
       | anyway, you're either revealing your IP using a wrapper that
       | reveals it instantly, use a site that isn't a search engine and
       | might be using your data, using a VPN that is based on reputable
       | and assumptions, usually based in another country you won't visit
       | or know much about aside from random reviewers, or using Tor,
       | losing latency, reasonable speed image search, and still be
       | possibly compromised.
        
         | kleinsch wrote:
         | But if you're using instances someone else is hosting, aren't
         | you hitting half the author's objections?
         | 
         | - They may be hosted in the US
         | 
         | - They may be hosted on AWS
         | 
         | - You have no idea if the maintainer of the instance is
         | tracking you
        
           | jarvuschris wrote:
           | ^ This right here. This article is pretty hogwash IMO
           | 
           | Points 1 and 3 aren't relevant if they aren't recording the
           | data. Companies in other jurisdictions have no magic
           | invulnerability you can trust to their data getting out
           | (legally or illegally) if they're storing it.
           | 
           | Points 2 and 5 are equally true of any open source project
           | unless you run it yourself from source. There are _plenty_ of
           | examples of users getting phished by maliciously built/hosted
           | open source tools
           | 
           | Point 4 is obviously not malicious tracking and a mistake any
           | project could make
           | 
           | At the end of the day though, unless you're going to run
           | everything yourself (which most people aren't) you have to
           | pick who to trust -- some random person running a server
           | somewhere, or a company with hundreds of employees recruited
           | under the premise of working on a privacy-centric search
           | engine who could all turn whistleblower
        
       | jostillmanns wrote:
       | Whoogle is another alternative, that focuses on Google search
       | results
        
       | ced wrote:
       | From the link:
       | 
       |  _The CEO sold his previous company 's data before founding DDG.
       | His previous company (Names DB) was a surveillance capitalist
       | service designed to coerce naive users to submit sensitive
       | information about their friends. _
       | 
       | Is that a fair statement? Can someone provide more context?
        
         | boomboomsubban wrote:
         | It was a failed social network to help you reconnect with old
         | friends. It tried to get you to recruit your friends
         | immediately after registering and had a typical social network
         | license. I'd say that description is intentionally describing
         | it in the worst possible light, but not wholly inaccurate.
        
           | deltree7 wrote:
           | Yet, HN is down and ready to suck searx dick because hurr
           | durr privacy. This is HN in a nutshell.
           | 
           | Hint: If all you privacy-paranoid people show the exact same
           | behavior, you are an advertiser dream. Sure, you won't be
           | sold shampoo like general population, most advertisement
           | companies know the exact kind of doomer-prepper items to sell
           | if you come via VPN/DDG/whatever convoluted hack/concoction
           | you come up to make your life inconvenient.
        
       | freediver wrote:
       | I always liked the term "search engine client" better (vs
       | 'metasearch engine'). In essence it is a product that can connect
       | to different search indexes.
       | 
       | An "email client" does exactly the same thing, connects to
       | different email servers and we do not call it "metaemail".
       | 
       | edit: just realized that with the current hype around metaverse,
       | 'metasearch' will probably be more appropriate for something
       | searching the metaverse in the future.
        
       | phantom_oracle wrote:
       | I will add add a disclaimer to this comment that it is tinfoil-
       | hat and just speculation(bordering on conspiracy) but many of
       | these "we are a privacy-first company" might actually just be
       | honeypots and fronts for 3-letter agencies.
       | 
       | The comment is not wholly conspiratorial, considering the CIA
       | owned Swiss crypto company: Crypto AG [1]
       | 
       | It's within the realm of possibility that most of these privacy
       | services could be owned by 3-letter agencies or small enough to
       | be coerced into cooperation.
       | 
       | [1]
       | https://www.scmp.com/news/world/europe/article/3050193/crypt...
        
         | marginalia_nu wrote:
         | I do think it's a bit of a red flag.
         | 
         | Sort of like how most anti-tracking browser extensions
         | eventually turn out to actually be tracking extensions. Or like
         | how used car dealers that have a name like "honest bob's cheap
         | luxury cars" often turn out to neither be honest, cheap nor
         | luxurious.
        
           | GhettoComputers wrote:
           | Isn't that confirmation bias? uMatrix and uBlock are
           | reliable, the opposite being PrivacyBadger. The EFF has lost
           | my trust before but I never assume maliciousness before
           | incompetence. https://old.reddit.com/r/privacytoolsIO/comment
           | s/l2dges/why_...
        
             | marginalia_nu wrote:
             | The list of browser extensions that in some form has
             | backpedaled from their central premise and main function,
             | the list is pretty long. Ghostery, Adblock, AdblockPlus,
             | ...
        
               | GhettoComputers wrote:
               | I don't disagree it's a lot, NoScript was another
               | example, uBlock and uMatrix by no fault of themselves
               | were also hijacked being open source, Ghostery was sold,
               | and Adblock Plus with acceptable ads wasn't bad as they
               | said. It was widely reported, I continued installing ABP,
               | since it was easy, wasn't hard to turn off acceptable
               | ads, and I think that direction they tried to move the
               | industry in wasn't harmful. I might have moved back to
               | Adblock or learned about hosts but if they were
               | successful we'd have less resource hungry ads, a net
               | benefit for everyone, especially when using public
               | computers or helping someone with IT.
               | 
               | Ghostery was more widely reported as Audacity adding
               | telemetry. Everyone who cared knew long before to leave
               | or uninstall it.
               | 
               | Hosts blocking is reliable and I've never had a single
               | malicious one with the wide assortment I used. PiHole
               | hasn't been hijacked either and I think it's unreasonable
               | to think that no group can make mistakes, faltering can't
               | ever happen, I really don't think Adblock Plus was that
               | bad.
               | 
               | If the market wasn't saturated with methods to block, I
               | would have stuck with them if they were remorseful.
               | 
               | -Sent from my not private Apple device I'll still use
               | since it's got a huge userbase on messaging in the US
        
         | jqpabc123 wrote:
         | Haven't you heard? The CIA has gone open source. They don't
         | need to own a company anymore.
         | 
         | They can just download the Searx source code; modify it as they
         | see fit, and make it available on a server someplace.
         | 
         | Can you prove that searx.be isn't run by a "3 letter agency"?
         | Can you prove that the source code running at searx.be is the
         | same as on Github?
         | 
         | The point being --- unless you have full access to the server,
         | open source means nothing with regard to privacy and security
         | of any service. It actually means less than nothing --- it
         | means it is super easy to build into a honeypot.
        
           | marc_abonce wrote:
           | Of course, there's no fool-proof solution to knowing what
           | code is running in the server side, but https://searx.space
           | at least shows if an instance modified their client-side
           | code, which you can see in the HTMl column.
           | 
           | To mitigate server-side code from identifying you, you can
           | consume an instance from Tor. Of course, you could try to do
           | that with any other search engine, but most of the other
           | search engines either block exit nodes or provide incomplete
           | functionality if you disable JS.
           | 
           | It's not perfect, but it may be good enough depending on your
           | threat model.
        
             | jqpabc123 wrote:
             | Note to the CIA --- don't modify the client side code when
             | building your honeypots.
             | 
             | Personally, I just use a VPN with the "lite" version of
             | DuckDuckGo --- no JS.
             | 
             | https://lite.duckduckgo.com/lite
        
         | ColinHayhurst wrote:
         | SearX is a project which we respect and a positive contribution
         | to improving search choice. Consideration of how it might be
         | being used is wise.
         | 
         | It's also wise to do due diligence on any company/service where
         | you are revealing sensitive personal information. Traffic
         | coming from Google in 2006, for sensitive medical search
         | queries was a catalysts for us going public in 2006 on our
         | strict no-tracking policy and we maintained that position.
         | 
         | We have yet to be contacted by authorities, but you'll have to
         | trust us on that one for now. Since we don't log any personal
         | or identifying data at all, we would have nothing to share [0].
         | You can read about our investors on our blog.
         | 
         | Building and maintaining a search engine with independent
         | infrastructure has a huge challenge and has meant building
         | proprietary IP over many years. Since we refuse to use
         | techniques used in growth hacking such as analytics from you
         | know who, and all tools involving any tracking, marketing is a
         | bigger challenge than it is for companies without strong
         | principles. It has been a mammoth effort, by mostly our founder
         | whose story you can read here [1].
         | 
         | [0] https://www.mojeek.com/about/privacy/ [1]
         | https://blog.mojeek.com/2021/03/to-track-or-not-to-track.htm...
        
         | sleepysysadmin wrote:
         | The thing is... lets say the CIA/NSA are tapping searx wholely
         | or just instances. What exactly are the ramifications? I feel
         | like they are going to be largely missing the target. A bunch
         | of techsavvy nerds trying different search engines aren't going
         | to be terrorists.
         | 
         | And even if they are? As a Canadian or someone who isnt in the
         | USA. What exactly is the point? Wouldn't this effectively be
         | the safest host? CIA/NSA wont be selling your private infos.
         | They wont be sending me to a blacksite because i look at python
         | documentation and youtube chill music.
        
           | kwhitefoot wrote:
           | > CIA/NSA wont be selling your private infos.
           | 
           | Why not? They used to sell cocaine after all and your info is
           | probably rather less risky.
        
             | sleepysysadmin wrote:
             | >Why not? They used to sell cocaine after all and your info
             | is probably rather less risky.
             | 
             | It will reveal the operation busting any potential for
             | catching terrorists.
        
               | sweetbitter wrote:
               | The purpose of government SIGINT (Signals Intelligence)
               | is certainly not to catch terrorists/pedophiles/money-
               | launderers. Those activities are generally
               | tolerated/endorsed by intelligence agencies, as they are
               | not heinous enough to garner their ire, even helping them
               | whenever they coerce someone into committing a terrorist
               | attack. The true purpose of all of those data is to
               | create a metadata map and to assess who is up to what and
               | who can do what, such that the powers of their nations
               | over the world can be maintained as long as possible.
        
               | [deleted]
        
         | phantom_oracle wrote:
         | I should have added that my comment implied as much to do with
         | DDG as it does with cheap-VPN-provider-35 with a shell company
         | in Belize.
         | 
         | The original comment was in reference to DDG proudly making
         | claims of not getting requests from .gov and marketing
         | themselves as a company who "cannot see what you search for".
        
       | tandav wrote:
       | Just searched a couple of queries like "opencv rectangle",
       | "python regex" - and it returned nothing
        
         | unixfox wrote:
         | Which instance did you use?
        
       | ColinHayhurst wrote:
       | https://www.searchenginemap.com/
        
         | account-5 wrote:
         | That site is horrible on mobile, a good portion of the screen
         | is taken up by the orange "download/view" infographic thing.
         | Interesting though to see how connected the engines are, I
         | would have thought DDG would be bigger with its bang option
         | though I assume it's about what is natively included in the
         | results.
        
           | ColinHayhurst wrote:
           | Yes, it is horrible on mobile. The size of all the
           | syndicating search services are the same. An update is
           | overdue.
           | 
           | Complementary twitters list are maintained here:
           | https://twitter.com/SearchEngineMap/lists
        
         | freediver wrote:
         | I believe rightdao.com is missing from the list. It has
         | independent index (and also impressive speed).
         | 
         | Also not sure what the criteria for inclusion is, but
         | search.marginalia.nu and teclis.com both have their own
         | indexes.
        
       | agluszak wrote:
       | Great! I wish there was a possibility to blocklist certain
       | domains (who wants to see Quora in their results...). This should
       | be easily implementable on Searx's side. Another feature I often
       | wish for is searching in a specific time period. It's so
       | annoying, for example on Youtube, when I remember that a video
       | was released in 2011, but there's simply no filter for it.
        
         | dalf wrote:
         | > I wish there was a possibility to blocklist certain domains
         | 
         | You can do that in this fork:
         | https://github.com/searxng/searxng/blob/e839910f4c4b085463f1...
        
       | herodotus wrote:
       | Could this be installed on a Raspberry Pi? I am very happy with
       | my Raspberry Pi-hole: would not mind adding a second pi for
       | searching.
        
       | skerit wrote:
       | Alright, going to try searx.be for a while then.
        
         | jqpabc123 wrote:
         | I heard it could be a honeypot for the CIA. But feel free to
         | prove me wrong.
        
           | unixfox wrote:
           | If you don't want to use a single searx instance then feel
           | free to use a random one automatically for each search thanks
           | to this tool which can be used locally:
           | https://searx.neocities.org/
        
             | jqpabc123 wrote:
             | I heard these searx instances could be linked together in a
             | honeypot network run by the CIA.
             | 
             | But feel free to prove me wrong.
        
       | schleck8 wrote:
       | Metager is a non-profit, open-source search engine running fully
       | on renewable energy. It also has a proxy for opening results
       | anonymously
       | 
       | https://metager.de
        
         | czechdeveloper wrote:
         | That seems quite exclusive to germany
        
         | hermitsings wrote:
         | https://metager.org/ for english users
        
       | abetusk wrote:
       | Search engines have been coming up lately, so maybe this is a
       | good a place as any to discuss some back of the envelope
       | calculations.
       | 
       | Let's say we wanted to recreate the web index made by Google. How
       | much cost and engineering would it take?
       | 
       | Estimating the size of the web from worldwidewebsize.com [0],
       | this is estimated at around 50 billion (50 _10^9). The average
       | web page size looks to be on the order of 1.5 Mb (1.5_ 10^6). The
       | nominal cost of hard disk space is about $0.02 / Gb [2].
       | 
       | So, roughly, that's 75 exabytes of data (~75 _10^15). At a cost
       | of $0.02 / Gb that gives roughly $1.5M just to buy the hardware
       | to store (a significant fraction of?) the web. The Hutter prize
       | exists [3], so maybe there's some confidence that we only need to
       | actually store 1/10 of that, so around $150k in costs.
       | 
       | For perspective, that's 10 multi millionaire silicon valley types
       | donating about $150k each, 100 "engineer types" at $15k each or
       | 1000 to 10,000 pro-active citizens at $1.5k to $150 each (_just*
       | for the hard disk space, discounting energy, bandwidth and other
       | operating costs).
       | 
       | If we try to extrapolate lowering hard disk space costs and take
       | the price halving to be about 2.5 years with a current
       | (pessimistic?) cost of $0.02/Gb, that's about 10-15 years before
       | a petabyte scale hard drive is available to the consumer for
       | $1000.
       | 
       | From my perspective, I would ask "why hasn't a decentralized
       | search index been created and/or is in wide use?". My guess is
       | that figuring out a robust enough system that's cheap enough is
       | still out of reach. $150 might not seem like a lot, but you have
       | to convince 10k people to devote energy just to search.
       | 
       | Put another way, when does the landscape change enough so that
       | decentralized search is a viable option? My guess is that when
       | people can store a significant fraction of the web locally for
       | nominal cost is the determining factor. Maybe some great
       | compression and/or AI sentiment analysis can be done to bootstrap
       | and maybe some type of financial incentives can help solve this
       | issue, but my bet these will only provide a light push in the
       | right direction and the needed technology is the underlying cheap
       | disk space.
       | 
       | As a side note, the worldwidewebsize.com [0] shows the number of
       | indexed pages by Google holding pretty constant over a five year
       | period with a sharp decline somewhere in 2020. I wonder if this
       | is the method of estimation or if Google has changed something
       | significant in their back end to alter their search engine and
       | storage.
       | 
       | [0] https://www.worldwidewebsize.com/
       | 
       | [1] https://www.pingdom.com/blog/webpages-are-getting-larger-
       | eve....
       | 
       | [2] https://www.backblaze.com/blog/hard-drive-cost-per-gigabyte/
       | 
       | [3] http://prize.hutter1.net/
        
       | kristianpaul wrote:
       | Doesnt searx looks for results at duckduckgo and google for you
       | anyway? Whats the difference from using ddg directly?
        
         | sebow wrote:
         | I don't think it uses DDG directly.But anyways you can
         | configure the sources for files, media, wiki,etc. Makes sense
         | since the engine is open source, but then again it's not really
         | a search engine itself but a metasearch one
        
         | BeetleB wrote:
         | Searx can ping multiple search engines, including those not
         | supported by DDG. For example, searx has a dedicated file
         | search, which includes torrents.
        
         | boudin wrote:
         | There's other sources available. It is a meta search engine, so
         | it will always rely on other sources, but you can disable
         | duckduck go and google backends.
        
         | luciusdomitius wrote:
         | Isn't duckduckgo just an alternative frontend for bing with an
         | integrated ad/tracking blocker. Or at least that's what they
         | claim.
        
           | nicce wrote:
           | It is. They say they add some indexing on they own, but
           | results are all the same.
        
       ___________________________________________________________________
       (page generated 2021-11-12 23:00 UTC)