[HN Gopher] Searx - Privacy-respecting metasearch engine ___________________________________________________________________ Searx - Privacy-respecting metasearch engine Author : cube00 Score : 243 points Date : 2021-11-12 12:17 UTC (10 hours ago) (HTM) web link (sagrista.info) (TXT) w3m dump (sagrista.info) | YXNjaGVyZWdlbgo wrote: | I had a searx instance running for a long time and it's great | when it works but the plugins for site specific searches break | all the time and if you have more than 3-4 users with a high | search frequency google blacklists your IP by throttling. | kaba0 wrote: | May I ask about what's your idea of preserving privacy with | self-hosting a website that searches for specific terms from a | presumably fixed IP, where by 1/3-1/4 chance it can be | attributed to you? | YXNjaGVyZWdlbgo wrote: | The instance ran on a mobile connection not associated with | any private information. | | EDIT: It was located in a german street light 20km away from | any of the users. Just to get the geolocation question out of | the way. | | It was more of an experiment than anything else there will be | a talk about it and other FreiFunk (Open Mesh Network in | Germany) related stuff at the next virtual CCC congress. | dalf wrote: | Did you renew the mobile IP from time to time? | | Note: since the version 1.0, searx stops sending request | for 1 day when a CAPTCHA is detected which might help a | little. | | (I'm really interested by the results of your experiment) | YXNjaGVyZWdlbgo wrote: | We renewed every 12 hours in the end we came to the | conclusion that google might discriminate traffic from | eastern european states. The sim cards were mostly from | poland, belarus and the ukraine. When we tested it | against german, french and italian cards captchas were | way more frequent on the eastern cards and showed way | earlier. I will ping you as soon as the talk is online. | ahtaarra wrote: | I recently made the swtich from DDG from Searx simply bevause | right-clicking on a search result to copy the url resulted in a | referrer link to be copied rather than the link of the result | destination. | southerntofu wrote: | I think enabling DoNotTrack header or disabling JavaScript | prevents this behavior: i cannot reproduce on Tor Browser. But | you are correct this is a worrying development. | caskstrength wrote: | I use https://addons.mozilla.org/en-US/firefox/addon/clearurls/ | to automatically convert referrals to actual links on web | pages. | ravenstine wrote: | I've been using ClearURLs for a while so I never even | realized that DDG used referrer links. Have they always done | this? | boomboomsubban wrote: | Isn't the referral link only on the ad results, which is | clearly marked "AD"? That's what my quick test shows. | ojosilva wrote: | Are there any opensource real internet search engines worth | looking at? I think we should be working on disrupting search as | a whole instead of depending on the Googles, Baidus and Bings of | the world. | | I'm fully aware of the massive crawling and storage requirements, | but opensource projects that can get search right can later 1) be | hosted by the powerhouses of the cloud or non-profit parties, or | 2) become a fully distributed hosting and crawling effort as in | p2p and blockchain. | [deleted] | vindarel wrote: | p2p: there's the Yacy effort. https://yacy.net/ I... couldn't | find a portal to try it out (I did years ago, the results | need... to be discussed. It's anyways easy to install and to | choose what part of the web to crawl.) | | > YaCy is free software for your own search engine. | | maybe they rebranded and don't aspire to be a complete web | search engine? | goldsteinq wrote: | The problem with self-hosted search engine is that they make you | _very_ unique: you're the only client of the "backend" engine | with that (static and non-NATed) IP. Furthermore, you're now one | of the small group of people with "hosting" IPs. Using self- | hosted SearX may make you easier to track, not harder. | | Using SearX hosted by someone else is marginally better, but now | you have to trust the owner of the server, which is probably not | what you want for privacy-centered search engine. | woodruffw wrote: | Could you clarify where the privacy concern is here? As I | understand it, I'm sharing my IP with search engines anyways; | the only difference with a self-hosted SearX instance is that | I'm sharing my server's IP instead. | | Is the concern that the latter's IP isn't behind a NAT, and | therefore is more unique? If so, I think that's the least | concerning of the identifying datapoints that a search engine | has access to -- my browser metadata is far more identifying. | With SearX, that information doesn't get forwarded (IIUC). | marc_abonce wrote: | If you don't want to expose your IP address, you can configure | searx to proxy all the queries through Tor. This obviously | makes the instance way slower and you'll have to disable some | engines that block Tor exit nodes, so it's a trade-off. | randomsilence wrote: | When you click a link on your SearX instance and you don't use | referrers, how can anybody track you? Nobody knows that you are | coming from your "backend" engine. | | You just reveal your search queries to the hosting provider if | he maliciously intercepts them. | drcongo wrote: | As someone who uses Safari with its built in list of search | providers, I'm rather stuck with DDG for address bar searches, | but boy it has really started to suck over the last year or two. | boomboomsubban wrote: | You could always do !searx | drcongo wrote: | True! I'm experimenting with Ecosia right now, but also | recently got on the beta for the kagi.com search engine which | so far has proven to be vasty superior to any other that I've | tried. | bombadilo wrote: | So don't use Safari then? Seems like a pretty simple solution. | sixothree wrote: | Edge is available for Mac. | drcongo wrote: | Chrome destroys the battery on my laptop and is basically | spyware these days, the Chrome-alikes are all dreadful - | Vivaldi has the jankiest UI ever, Brave is unbearably | sluggish, Edge runs processes that I didn't ask for like | Microsoft Updater that bugs me constantly and spams the new | tab screen with all sorts of low rent junk that I can't | remove. | | Firefox is my developing browser and I do really like it, but | Safari my actual browsing browser because it's by far the | best browsing browser on Macs. | xanaxagoras wrote: | I've landed on Librewolf for personal, ungoogled chromium | for work. It's great so far, been on this setup for a few | months. | boogies wrote: | If you stop using macOS you can get much better frontends | for WebKit, from the simple, rather Safari-like GNOME Web | (AKA Epiphany) to the powerful Pentadactyl-like luakit. | ColinHayhurst wrote: | Yes, Safari is the worst offender when it comes to offering | search choice on desktop. | | On iOS using a new app called Hyperweb you can the new Safari | extensions to access and create a longer preferences list. | https://hyperweb.app/ | | We really shouldn't have to choose this or that, but should be | able to easily use multiple choices in search. You can do that | today as explained here, but you'll need to switch browsers. | https://blog.mojeek.com/2021/09/multiple-choice-in-search.ht... | jeroenhd wrote: | > with its built in list of search providers | | TIL. That's just... terrible UX. | jqpabc123 wrote: | _You do not need to trust third parties to keep you private and | not track your every move, which is awesome._ | | The only way to avoid third parties to run your own server ... | but this "metasearch engine" is basically just an aggregation | proxy. So every search can still be tracked back to your proxy | server by Google, Bing or whoever is providing the actual | results. | ricardo81 wrote: | 'Uses Amazon Web Services (AWS) as a cloud provider and Cloudfare | CDS.' | | IIRC DDG uses Microsoft servers now exclusively. Makes sense | given the volume of queries they're handling and all dependent on | Bing API. | AmosLightnin wrote: | Privacy is important, but I also care about the terrible quality | of search results I get from nearly all the major providers these | days. Couldn't an aggregator like SearX host a machine learning | layer that learns what results are more likely to be valuable to | me, and ranks them higher in the results? Keeping the | customization layer on my own server and improving search results | would seem to be a big advantage both privacy and performance | wise. | analyte123 wrote: | If you're trying to be comprehensive, a few other suggestions in | rough order of their usefulness: | | Gigablast.com - Has been improved recently. private.sh is | supposed to be a private proxy for Gigablast, but it has been | broken recently | | Exalead.com - run by a French defense contractor for some reason | | filepursuit.com - search for files only. Need to play around with | it more. | | PeteyVid.com - multi-platform video search | | Wiby.me - focus on "classic" style web sites | jefc1111 wrote: | I have been trialling Swisscows (https://swisscows.com/) and have | found it quite useful. I have not deeply researched their privacy | claims, but for now I am just trying to not use Google or | mainstream alternatives. | | Does anyone else have experience or comments on Swisscows' search | engine? Seems like an interesting company all round. | hans_castorp wrote: | Qwant is another (non-US) alternative | rahen wrote: | Doesn't it use Bing results, like DuckDuckGo does? I read it's | hosted on MS Azure like Bing and DDG, so in the end it's | somewhat just a rebranded Bing. Quite a shame for a European | (Franco-German) search engine. | kaba0 wrote: | Even if they were just a proxy behind bing, that would be | still good, weren't it? | rahen wrote: | Well, claiming to offer an alternative to the GAFAM while | depending on their products / data / infrastructure is a | bit misleading. | | So indeed they're doing okay privacy wise, but a lot of | users feel cheated when they realize their "independent | search engine" (DuckDuckGo) is just a Bing portal hosted on | Azure. | citizenpaul wrote: | Slightly off topic but has anyone a good solution for removing | content farm seach results in this or any engine. For example | some worst offenders wikihow , forbes, business i sider. | ChrisArchitect wrote: | Please keep title same as the actual post: Searx - moving away | from DuckDuckGo | | It gives more context to the topic, as in it's not just a link to | the search engine itself. | hermitsings wrote: | I had been considering using Searx (which I had known about | before) lately since I have to use DDG+Google for getting | satisfactory results. | | Edit: I really like DDG bang and vim like nav keys tho | visiblink wrote: | Someone here once posted a link to duckduckstart.com. I have it | set up in my search bar now. | | If you search, it goes through Startpage (Google results, more | privately). If you search with a bang, it goes through | Duckduckgo. It's probably close to what you're looking for. | mg wrote: | Every time I stumble across a new search engine, I add it to my | search engine comparison tool: | | https://www.gnod.com/search/ | | Will add SearX now. It seems to provide reasonably good results. | | Update: It's on (under 'More Engines'). | [deleted] | Minor49er wrote: | You should add Fireball. It's excellent | | https://fireball.de/ | nix23 wrote: | A yacy instance would be good too ;) | | https://yacy.net/ | danskeren wrote: | Ask.Moe | | nona.de | 51stpage wrote: | There is also Marginalia https://search.marginalia.nu/ which I | don't see on your list. | marginalia_nu wrote: | It's currently undergoing its monthly maintenance just FYI. | It's up and technically working, but with a drastically | reduced index size. | sysadm1n wrote: | Tried Marginalia. So many plaintext http links which I avoid | like the plague. That's my only gripe with it. Other than | that, it's an awesome tool. | forgotmypw17 wrote: | That's funny, I personally prefer HTTP for its simplicity, | human-readability, accessibility, lack of centralized | control, backwards compatibility and lack of forced | upgrades or locking out old clients, etc., not to mention | speed. | | Of course, I'm fortunate enough to live in a place where | MITM attacks are virtually non-existent, aside from WiFi | portals and maybe ISP banners (which I've never | experienced.) | freediver wrote: | > So many plaintext http links which I avoid like the | plague. | | Why? What you described appears to be the safest place on | the web. | sysadm1n wrote: | I only browse HTTPS sites. I have the `HTTPS Everywhere` | addon installed with the `EASE` / Encrypt All Sites | Eligible turned on so I don't accidentally browse an | unencrypted website. Something like 85%/90% of the web is | encrypted now, and there's _no_ excuse to be using | outdated plaintext http anymore. It 's a privacy and | security risk. There are only few instances where I had | to view a http site (I'm a freelancer and my client's | webpage was still unencrypted, so I had to see it, so a | rare exception to the rule). | marginalia_nu wrote: | It's funny because I got like 70% HTTP in my index, so | the whole "90% of the web is encrypted" seems to depend | on which sample you are looking at. Google doesn't index | HTTP at all, so that's not a good place to go looking for | what's the most popular. That's in fact half the reason | why I built this search engine in the first place, | because they demand things of websites that some websites | simply can't or wont comply with. | | A lot of servers still use HTTP, for various reasons. | There are also some clients that can't use HTTPS. | stjohnswarts wrote: | I think there are absolute numbers and then there are | "the sites most people visit regularly" and those | probably are 75% https. It's relative like most things. | marginalia_nu wrote: | Absolute numbers are pretty hard to define, as is the | size of the Internet. | | If the same server has two domains associated with it, | does it count twice? Now consider a loadbalancer that | points to virtual servers on the same machine. How about | subdomains? | stjohnswarts wrote: | It may be a privacy risk, but it's certainly not a | security risk with plain old blog and static sites that | have completely open data available to anyone who wants | to surf to their sites. | freediver wrote: | The privacy and security risk comes in large part from | the nature of code and actions performed on the site. | | In reality as far as privacy goes, the matters are on | average opposite to your claim. Most sites that will put | your privacy at risk today are using https - I am talking | about the vast majority of the commercially operated web | today. I know my privacy is much better respected on a | plain text (no javascript) site using http then on | [insert a top 10k most popular site here] using https. | | And for security, if I am not performing for example | shopping or entering my billing details anywhere on the | site, I do not see how a http site can compromise my | security. | | I actually prefer deploying http sites for simple test | projects where speed is imperative because they are also | faster - there is no SSL handshake needed to connect. | marginalia_nu wrote: | You should be getting fewer .txt-results in the new update, | a part of the problem was that keyword extraction for plain | text was kind of not working as intended, so they'd usually | crop up as false positives toward the end of any search | page. I'm hoping that will work better once the upgrade is | finished. | mattowen_uk wrote: | I have this REALLY old text file of search engine URLs: | | http://www.jaruzel.com/textfiles/Old%20Web%20Info/Internet%2... | | Google basically killed almost off of them off :( | | It would be great to see some proper competition in the search | space, especially around specialist search engines. | sixothree wrote: | I would love a search engine targeted towards developers. | Searching for symbols seem to be a problem with google, not | to mention all of the utterly crappy results they serve up. | ijr wrote: | Symbolhound does that. | stagas wrote: | Also grep.app for searching into repos really fast. | jwithington wrote: | where's you.com lol | mg wrote: | Thanks, added. | 1_player wrote: | Is anything supposed to happen when I enter something and press | Enter? Nothing happens for me, FF on Windows, uBlock. | reayn wrote: | You are supposed to type something into the search field then | click the engine you want to use, it will pass on whatever | you entered. | | I agree that the creator could make that a little more clear | somewhere on the page. | ColinHayhurst wrote: | Nice work. If you have a twitter handle you might request to | get added to these lists; either way they might be useful for | you: https://twitter.com/SearchEngineMap/lists | imglorp wrote: | This is a nice compilation. | | It would be very interesting if it examined and compared | results. | m-i-l wrote: | Yes, looks good, although I thought it was going to be a | federated search, i.e. you enter your search term and it | performs that search on all the sites selected. The simpler | way of implementing a federated search would be to show | separate results boxes from each site, although that wouldn't | scale well to a large number of sites, and it can get quite | complicated to try to combine the results. | wenbin wrote: | You should add the podcast search engine | https://www.listennotes.com/ | autoexec wrote: | Too bad they force you to log in to view a result or do | anything but search. They also share/sell your data to 3rd | parties including Google | mg wrote: | Thanks, added. | | Holy Moly, are there really over 100 million podcast episodes | out there? | wenbin wrote: | Yes. some numbers: https://www.listennotes.com/podcast- | stats/ | | Listen Notes was started in early 2017 as a side project, | when there were ~23 million episodes. | | I remembered seeing the number of web pages indexed by | Google in early 1998 was ~25 million, then I thought that | "ok, 23 million episodes might justify the existence of a | podcast search engine" :) | fsflover wrote: | You should also add YaCy: https://yacy.net. | mg wrote: | It seems to be not web based? | | When I click on "Try out the YaCy Demo Peer", I get "502 Bad | Gateway". | fsflover wrote: | It's self-hosted and peer-to-peer. You could search for | other public-facing instances, e.g., | http://sokrates.homeunix.net:6060. Ideally, you could run | your own instance to show the world how it works. | xwdv wrote: | People don't want privacy. They want results. | | Society programs us to think privacy is our top concern. Is it? | marginalia_nu wrote: | I do think you are mostly correct. Some people really care | about privacy, but for most people it isn't a huge concern. | | This doesn't make it any less important, but just means that if | your main selling point is "we're the search engine that cares | about privacy", then odds are you're not going to get a lot of | users. | lardolan wrote: | I agree with most of your points. Although It may be matter | of a time to grow that niche. IMO Pople who value it are | willing to take sacrifice of less functional outputs. | | Privacy is most effective selling point when working with | sensitive information. | marginalia_nu wrote: | If you are working with sensitive information, the last | thing you probably want to do is broadcast that you are | working with sensitive information. | kreeben wrote: | Au contraire, it's the first thing you should broadcast, | unless you're trying to scam people out of their PII. | [deleted] | GhettoComputers wrote: | Isn't hosting your own instance taking away every benefit of | searx by revealing your IP? If you had a VPN you'd use it to mask | your IP from tracking of search engines anyway, and if you used | Tor for it, you'd probably move back soon since it'll be so much | worst with latency, like how many people go back to google | because DDG sucks for results. I suggest just using instances you | can find them at https://searx.space some are more reliable than | others but none have been trouble free. There's a lot of these | instances for privacy, chrome has a privacy plugin with a white | eye that uses nitter.net for Twitter, teddit for Reddit and other | public instances. One instance of Reddit was even made completely | in rust. ;) | | https://chrome.google.com/webstore/detail/privacy-redirect/p... | | To reply to the person under me, you're always relying on a trust | in something unverified and untrustworthy filters like VPNs | anyway, you're either revealing your IP using a wrapper that | reveals it instantly, use a site that isn't a search engine and | might be using your data, using a VPN that is based on reputable | and assumptions, usually based in another country you won't visit | or know much about aside from random reviewers, or using Tor, | losing latency, reasonable speed image search, and still be | possibly compromised. | kleinsch wrote: | But if you're using instances someone else is hosting, aren't | you hitting half the author's objections? | | - They may be hosted in the US | | - They may be hosted on AWS | | - You have no idea if the maintainer of the instance is | tracking you | jarvuschris wrote: | ^ This right here. This article is pretty hogwash IMO | | Points 1 and 3 aren't relevant if they aren't recording the | data. Companies in other jurisdictions have no magic | invulnerability you can trust to their data getting out | (legally or illegally) if they're storing it. | | Points 2 and 5 are equally true of any open source project | unless you run it yourself from source. There are _plenty_ of | examples of users getting phished by maliciously built/hosted | open source tools | | Point 4 is obviously not malicious tracking and a mistake any | project could make | | At the end of the day though, unless you're going to run | everything yourself (which most people aren't) you have to | pick who to trust -- some random person running a server | somewhere, or a company with hundreds of employees recruited | under the premise of working on a privacy-centric search | engine who could all turn whistleblower | jostillmanns wrote: | Whoogle is another alternative, that focuses on Google search | results | ced wrote: | From the link: | | _The CEO sold his previous company 's data before founding DDG. | His previous company (Names DB) was a surveillance capitalist | service designed to coerce naive users to submit sensitive | information about their friends. _ | | Is that a fair statement? Can someone provide more context? | boomboomsubban wrote: | It was a failed social network to help you reconnect with old | friends. It tried to get you to recruit your friends | immediately after registering and had a typical social network | license. I'd say that description is intentionally describing | it in the worst possible light, but not wholly inaccurate. | deltree7 wrote: | Yet, HN is down and ready to suck searx dick because hurr | durr privacy. This is HN in a nutshell. | | Hint: If all you privacy-paranoid people show the exact same | behavior, you are an advertiser dream. Sure, you won't be | sold shampoo like general population, most advertisement | companies know the exact kind of doomer-prepper items to sell | if you come via VPN/DDG/whatever convoluted hack/concoction | you come up to make your life inconvenient. | freediver wrote: | I always liked the term "search engine client" better (vs | 'metasearch engine'). In essence it is a product that can connect | to different search indexes. | | An "email client" does exactly the same thing, connects to | different email servers and we do not call it "metaemail". | | edit: just realized that with the current hype around metaverse, | 'metasearch' will probably be more appropriate for something | searching the metaverse in the future. | phantom_oracle wrote: | I will add add a disclaimer to this comment that it is tinfoil- | hat and just speculation(bordering on conspiracy) but many of | these "we are a privacy-first company" might actually just be | honeypots and fronts for 3-letter agencies. | | The comment is not wholly conspiratorial, considering the CIA | owned Swiss crypto company: Crypto AG [1] | | It's within the realm of possibility that most of these privacy | services could be owned by 3-letter agencies or small enough to | be coerced into cooperation. | | [1] | https://www.scmp.com/news/world/europe/article/3050193/crypt... | marginalia_nu wrote: | I do think it's a bit of a red flag. | | Sort of like how most anti-tracking browser extensions | eventually turn out to actually be tracking extensions. Or like | how used car dealers that have a name like "honest bob's cheap | luxury cars" often turn out to neither be honest, cheap nor | luxurious. | GhettoComputers wrote: | Isn't that confirmation bias? uMatrix and uBlock are | reliable, the opposite being PrivacyBadger. The EFF has lost | my trust before but I never assume maliciousness before | incompetence. https://old.reddit.com/r/privacytoolsIO/comment | s/l2dges/why_... | marginalia_nu wrote: | The list of browser extensions that in some form has | backpedaled from their central premise and main function, | the list is pretty long. Ghostery, Adblock, AdblockPlus, | ... | GhettoComputers wrote: | I don't disagree it's a lot, NoScript was another | example, uBlock and uMatrix by no fault of themselves | were also hijacked being open source, Ghostery was sold, | and Adblock Plus with acceptable ads wasn't bad as they | said. It was widely reported, I continued installing ABP, | since it was easy, wasn't hard to turn off acceptable | ads, and I think that direction they tried to move the | industry in wasn't harmful. I might have moved back to | Adblock or learned about hosts but if they were | successful we'd have less resource hungry ads, a net | benefit for everyone, especially when using public | computers or helping someone with IT. | | Ghostery was more widely reported as Audacity adding | telemetry. Everyone who cared knew long before to leave | or uninstall it. | | Hosts blocking is reliable and I've never had a single | malicious one with the wide assortment I used. PiHole | hasn't been hijacked either and I think it's unreasonable | to think that no group can make mistakes, faltering can't | ever happen, I really don't think Adblock Plus was that | bad. | | If the market wasn't saturated with methods to block, I | would have stuck with them if they were remorseful. | | -Sent from my not private Apple device I'll still use | since it's got a huge userbase on messaging in the US | jqpabc123 wrote: | Haven't you heard? The CIA has gone open source. They don't | need to own a company anymore. | | They can just download the Searx source code; modify it as they | see fit, and make it available on a server someplace. | | Can you prove that searx.be isn't run by a "3 letter agency"? | Can you prove that the source code running at searx.be is the | same as on Github? | | The point being --- unless you have full access to the server, | open source means nothing with regard to privacy and security | of any service. It actually means less than nothing --- it | means it is super easy to build into a honeypot. | marc_abonce wrote: | Of course, there's no fool-proof solution to knowing what | code is running in the server side, but https://searx.space | at least shows if an instance modified their client-side | code, which you can see in the HTMl column. | | To mitigate server-side code from identifying you, you can | consume an instance from Tor. Of course, you could try to do | that with any other search engine, but most of the other | search engines either block exit nodes or provide incomplete | functionality if you disable JS. | | It's not perfect, but it may be good enough depending on your | threat model. | jqpabc123 wrote: | Note to the CIA --- don't modify the client side code when | building your honeypots. | | Personally, I just use a VPN with the "lite" version of | DuckDuckGo --- no JS. | | https://lite.duckduckgo.com/lite | ColinHayhurst wrote: | SearX is a project which we respect and a positive contribution | to improving search choice. Consideration of how it might be | being used is wise. | | It's also wise to do due diligence on any company/service where | you are revealing sensitive personal information. Traffic | coming from Google in 2006, for sensitive medical search | queries was a catalysts for us going public in 2006 on our | strict no-tracking policy and we maintained that position. | | We have yet to be contacted by authorities, but you'll have to | trust us on that one for now. Since we don't log any personal | or identifying data at all, we would have nothing to share [0]. | You can read about our investors on our blog. | | Building and maintaining a search engine with independent | infrastructure has a huge challenge and has meant building | proprietary IP over many years. Since we refuse to use | techniques used in growth hacking such as analytics from you | know who, and all tools involving any tracking, marketing is a | bigger challenge than it is for companies without strong | principles. It has been a mammoth effort, by mostly our founder | whose story you can read here [1]. | | [0] https://www.mojeek.com/about/privacy/ [1] | https://blog.mojeek.com/2021/03/to-track-or-not-to-track.htm... | sleepysysadmin wrote: | The thing is... lets say the CIA/NSA are tapping searx wholely | or just instances. What exactly are the ramifications? I feel | like they are going to be largely missing the target. A bunch | of techsavvy nerds trying different search engines aren't going | to be terrorists. | | And even if they are? As a Canadian or someone who isnt in the | USA. What exactly is the point? Wouldn't this effectively be | the safest host? CIA/NSA wont be selling your private infos. | They wont be sending me to a blacksite because i look at python | documentation and youtube chill music. | kwhitefoot wrote: | > CIA/NSA wont be selling your private infos. | | Why not? They used to sell cocaine after all and your info is | probably rather less risky. | sleepysysadmin wrote: | >Why not? They used to sell cocaine after all and your info | is probably rather less risky. | | It will reveal the operation busting any potential for | catching terrorists. | sweetbitter wrote: | The purpose of government SIGINT (Signals Intelligence) | is certainly not to catch terrorists/pedophiles/money- | launderers. Those activities are generally | tolerated/endorsed by intelligence agencies, as they are | not heinous enough to garner their ire, even helping them | whenever they coerce someone into committing a terrorist | attack. The true purpose of all of those data is to | create a metadata map and to assess who is up to what and | who can do what, such that the powers of their nations | over the world can be maintained as long as possible. | [deleted] | phantom_oracle wrote: | I should have added that my comment implied as much to do with | DDG as it does with cheap-VPN-provider-35 with a shell company | in Belize. | | The original comment was in reference to DDG proudly making | claims of not getting requests from .gov and marketing | themselves as a company who "cannot see what you search for". | tandav wrote: | Just searched a couple of queries like "opencv rectangle", | "python regex" - and it returned nothing | unixfox wrote: | Which instance did you use? | ColinHayhurst wrote: | https://www.searchenginemap.com/ | account-5 wrote: | That site is horrible on mobile, a good portion of the screen | is taken up by the orange "download/view" infographic thing. | Interesting though to see how connected the engines are, I | would have thought DDG would be bigger with its bang option | though I assume it's about what is natively included in the | results. | ColinHayhurst wrote: | Yes, it is horrible on mobile. The size of all the | syndicating search services are the same. An update is | overdue. | | Complementary twitters list are maintained here: | https://twitter.com/SearchEngineMap/lists | freediver wrote: | I believe rightdao.com is missing from the list. It has | independent index (and also impressive speed). | | Also not sure what the criteria for inclusion is, but | search.marginalia.nu and teclis.com both have their own | indexes. | agluszak wrote: | Great! I wish there was a possibility to blocklist certain | domains (who wants to see Quora in their results...). This should | be easily implementable on Searx's side. Another feature I often | wish for is searching in a specific time period. It's so | annoying, for example on Youtube, when I remember that a video | was released in 2011, but there's simply no filter for it. | dalf wrote: | > I wish there was a possibility to blocklist certain domains | | You can do that in this fork: | https://github.com/searxng/searxng/blob/e839910f4c4b085463f1... | herodotus wrote: | Could this be installed on a Raspberry Pi? I am very happy with | my Raspberry Pi-hole: would not mind adding a second pi for | searching. | skerit wrote: | Alright, going to try searx.be for a while then. | jqpabc123 wrote: | I heard it could be a honeypot for the CIA. But feel free to | prove me wrong. | unixfox wrote: | If you don't want to use a single searx instance then feel | free to use a random one automatically for each search thanks | to this tool which can be used locally: | https://searx.neocities.org/ | jqpabc123 wrote: | I heard these searx instances could be linked together in a | honeypot network run by the CIA. | | But feel free to prove me wrong. | schleck8 wrote: | Metager is a non-profit, open-source search engine running fully | on renewable energy. It also has a proxy for opening results | anonymously | | https://metager.de | czechdeveloper wrote: | That seems quite exclusive to germany | hermitsings wrote: | https://metager.org/ for english users | abetusk wrote: | Search engines have been coming up lately, so maybe this is a | good a place as any to discuss some back of the envelope | calculations. | | Let's say we wanted to recreate the web index made by Google. How | much cost and engineering would it take? | | Estimating the size of the web from worldwidewebsize.com [0], | this is estimated at around 50 billion (50 _10^9). The average | web page size looks to be on the order of 1.5 Mb (1.5_ 10^6). The | nominal cost of hard disk space is about $0.02 / Gb [2]. | | So, roughly, that's 75 exabytes of data (~75 _10^15). At a cost | of $0.02 / Gb that gives roughly $1.5M just to buy the hardware | to store (a significant fraction of?) the web. The Hutter prize | exists [3], so maybe there's some confidence that we only need to | actually store 1/10 of that, so around $150k in costs. | | For perspective, that's 10 multi millionaire silicon valley types | donating about $150k each, 100 "engineer types" at $15k each or | 1000 to 10,000 pro-active citizens at $1.5k to $150 each (_just* | for the hard disk space, discounting energy, bandwidth and other | operating costs). | | If we try to extrapolate lowering hard disk space costs and take | the price halving to be about 2.5 years with a current | (pessimistic?) cost of $0.02/Gb, that's about 10-15 years before | a petabyte scale hard drive is available to the consumer for | $1000. | | From my perspective, I would ask "why hasn't a decentralized | search index been created and/or is in wide use?". My guess is | that figuring out a robust enough system that's cheap enough is | still out of reach. $150 might not seem like a lot, but you have | to convince 10k people to devote energy just to search. | | Put another way, when does the landscape change enough so that | decentralized search is a viable option? My guess is that when | people can store a significant fraction of the web locally for | nominal cost is the determining factor. Maybe some great | compression and/or AI sentiment analysis can be done to bootstrap | and maybe some type of financial incentives can help solve this | issue, but my bet these will only provide a light push in the | right direction and the needed technology is the underlying cheap | disk space. | | As a side note, the worldwidewebsize.com [0] shows the number of | indexed pages by Google holding pretty constant over a five year | period with a sharp decline somewhere in 2020. I wonder if this | is the method of estimation or if Google has changed something | significant in their back end to alter their search engine and | storage. | | [0] https://www.worldwidewebsize.com/ | | [1] https://www.pingdom.com/blog/webpages-are-getting-larger- | eve.... | | [2] https://www.backblaze.com/blog/hard-drive-cost-per-gigabyte/ | | [3] http://prize.hutter1.net/ | kristianpaul wrote: | Doesnt searx looks for results at duckduckgo and google for you | anyway? Whats the difference from using ddg directly? | sebow wrote: | I don't think it uses DDG directly.But anyways you can | configure the sources for files, media, wiki,etc. Makes sense | since the engine is open source, but then again it's not really | a search engine itself but a metasearch one | BeetleB wrote: | Searx can ping multiple search engines, including those not | supported by DDG. For example, searx has a dedicated file | search, which includes torrents. | boudin wrote: | There's other sources available. It is a meta search engine, so | it will always rely on other sources, but you can disable | duckduck go and google backends. | luciusdomitius wrote: | Isn't duckduckgo just an alternative frontend for bing with an | integrated ad/tracking blocker. Or at least that's what they | claim. | nicce wrote: | It is. They say they add some indexing on they own, but | results are all the same. ___________________________________________________________________ (page generated 2021-11-12 23:00 UTC)