[HN Gopher] Removing my site from Google search ___________________________________________________________________ Removing my site from Google search Author : todsacerdoti Score : 138 points Date : 2021-10-03 17:57 UTC (5 hours ago) (HTM) web link (www.btao.org) (TXT) w3m dump (www.btao.org) | bouke wrote: | Why not use robots.txt instead of littering your html with | googlebot instructions? | intricatedetail wrote: | I have disallowed all robots in my robots.txt and still project | shows up in the search. | TheChaplain wrote: | Yes, pretty sure this is the way to go. | | You can even tell which bots are allowed to index and not. | tao_oat wrote: | Hi, author here. Google stopped supporting robots.txt [edit: as | a way to fully remove your site] a few years ago, so these meta | tags are now the recommended way of keeping their crawler at | bay: https://developers.google.com/search/blog/2019/07/a-note- | on-... | new_guy wrote: | Did you actually read your link? That's not at all what it | says. | tao_oat wrote: | To be clear, stopped supporting robots.txt _noindex_ a few | years ago. | | Combined with the fact that Google might list your site | [based only on third-party links][1], robots.txt isn't an | effective way to remove your site from Google's results. | | Sorry, could have been clearer. | | [1]: https://developers.google.com/search/docs/advanced/rob | ots/in... | dd82 wrote: | >noindex in robots meta tags: Supported both in the HTTP | response headers and in HTML, the noindex directive is the | most effective way to remove URLs from the index when | crawling is allowed. | | Seems clear enough to me | jen20 wrote: | Quote from the linked article: | | " For those of you who relied on the noindex indexing | directive in the robots.txt file, which controls crawling, | there are a number of alternative options:" | | The first option is the meta tag. It does mention an | alternative directive for robots.txt, however. | intricatedetail wrote: | Meta tag implies robot is going to fetch the page, | effectively at very least using up you bandwidth. It | should be illegal. | ghassanmas wrote: | What about the blocking google bot by their IPs, also | combined with user-agent wouldn't that stop the crawlers | | Google crawlers IPs https://www.lifewire.com/what-is-the- | ip-address-of-google-81... | gnabgib wrote: | This page has a little more detail: https://developers.goog | le.com/search/docs/advanced/crawling/... | | "If other pages point to your page with descriptive text, | Google could still index the URL without visiting the page. | If you want to block your page from search results, use | another method such as password protection or noindex. " | Animats wrote: | Did you think that mighty Google would pay attention to your | puny "noindex" tag? Ha! | [deleted] | thih9 wrote: | According to google's own docs, this should work. | | > You can prevent a page from appearing in Google Search by | including a noindex meta tag in the page's HTML code, or by | returning a noindex header in the HTTP response. | | Source: https://developers.google.com/search/docs/advanced/ | crawling/... | swills wrote: | Until they change the rules again... | bryanrasmussen wrote: | I mean technically that says that your site won't appear | in search results, not that your site won't be used to | profile people, determine other site ratings based on | your site's content etc. | | they won't show your site's content, but that doesn't | mean they won't use your site's content. | thih9 wrote: | I thought that (i.e. removing the site from google | search) was the goal. | | I'd review the other usage on a case by case basis; e.g. | determining ratings of other sites seems fair use to me. | I'd guess you're allowing others to use your site's | content when you're making your site public (TINLA). | bryanrasmussen wrote: | maybe, but I guess I would be cantankerous enough to see | the goal as preventing google from profiting off your | site. | mro_name wrote: | yes, I do think that | walshemj wrote: | robots.txt stops crawling - you can get indexed via other | mechanisms. | | You want no index robots tags on all your pages and let google | see those. | | You can use GSC (Google search console) to remove a site / page | from the index | vadfa wrote: | Or even better, iptables rules :P | TedDoesntTalk wrote: | Doesn't that mean you have to know every IP Address used by | Google bot now and in the future? | AshamedCaptain wrote: | Not a very hard problem; after all, many websites allow | full access to Googlebot IP ranges yet show a paywall to | everyone else (including competing search engines). | | I also happen to ban Google ranges on multiple less-public | sites specially since they completely ignore robots.txt and | crawl-delay. | judge2020 wrote: | The way to check googlebot (in a way that will be resistant | to expansion of Googlebot's IP ranges in the future) is to | perform hostname lookup, with dns lookup as well to verify | that the rDNS isn't a lie: https://developers.google.com/se | arch/docs/advanced/crawling/... | lucb1e wrote: | Indeed, this was one of the things I considered (note I'm | not OP), but then I didn't really want to rely on DNS. | https://duckduckgo.com/?q=it's+always+DNS | intricatedetail wrote: | It's crazy that you have pollute your project with Google brand | just so that they don't steal your bandwidth and content. Why is | this not illegal? | lucb1e wrote: | Oh hey I thought I was the only one. lucb1e.com and another site | are also not indexed, though I blocked it based on the user agent | string. That way it doesn't get page data or non-HTML files from | my server. I introduced this when they were pulling this AMP | thing: https://lucb1e.com/?p=post&id=130 It personally doesn't | impact me, but it impacts other people on the internet and I | figured it was the only thing I can do to try and diversify this | market (since I myself already switched to another search | engine). | | There are zero other restrictions on my site. Use any search | engine other than google. Or don't, up to you. | forgotmypw17 wrote: | I use simple HTTP auth with an easy username and password on most | my sites. It is rarely a problem for anyone I invite, except | perhaps Instagram's browser, but no crawler traffic. | markoutso wrote: | That's a shitty solution. The whole point is to keep the | website public. | rezonant wrote: | A more interesting issue is the opposite -- many large sites have | robots.txt rules that Disallow all crawlers _except_ Google. A | new search engine either 100% respects robots.txt with the result | that some major properties are completely unavailable in their | index, ignore robots.txt in these special cases where robots.txt | configuration is unreasonable, or- crawl anything that allows | Google to crawl it. None of these options are great. | greyivy wrote: | Any idea why this would be (other than incompetence)? | robbrown451 wrote: | It seems like the amount it will hurt Google is directly | proportional to the amount it will hurt the owner of the site | (assuming they want people to read their message). | | I'm sure someone at Google is pretty happy that they don't have | to show this page in their search results. Nobody can accuse them | of bias against anti-Google pages -- the site owner did it to | themselves. | | Seems like as perfect an example of "cutting off nose to spite | face" as I can imagine. (ok, refusing the vax and dying of COVID | to get back at the left might be a better example, but this one | is close) | keithnz wrote: | So I did a search for the title of his blog post to see what | comes up, this HN page is top hit. | mattlondon wrote: | Alternatively just use EFF's privacy badger and duckduckgo to | stop feeding the beast? | | Those are active steps you can take - I am not convinced a few | metatags will stop Google spidering your site (even if it is | invisible in results), and is of questionable value if you are | still using Google search and not blocking their scripts. | pydry wrote: | It's not either/or. you can do both. | ObamaBinSpying wrote: | Or just block all of the IP addresses listed at these URLs: | | https://whois.arin.net/rest/org/GOGL/nets | | https://whois.arin.net/rest/org/GOOGL-1/nets | | https://whois.arin.net/rest/org/GOOGL-2/nets | | https://whois.arin.net/rest/org/GOOGL-24/nets | | https://whois.arin.net/rest/org/GOOGL-4/nets | | https://whois.arin.net/rest/org/GOOGL-46/nets | | https://whois.arin.net/rest/org/GOOGL-5/nets | | https://whois.arin.net/rest/org/GOOGL-9/nets | | https://whois.arin.net/rest/org/GL-654/nets | | https://whois.arin.net/rest/org/GL-895/nets | FpUser wrote: | Removing website from Google search is the least of worries. | Every meaningful aspect of your life is now being monitored by | corporations and governments. It is too fucking late. That fucked | up social scoring system being used in China to oppress people is | coming here. Only instead of government doing it directly It will | be mostly performed by corporations to keep the appearance of | "free" society. Corps will collect your data, assign you a rank | and act accordingly. | cubano wrote: | So if "surveillance capitalism" is apparently the new scare, | would "surveillance socialism" be better? | | Or are we supposed to imagine that under socialism, there would | be no need for Big Tech surveillance? This I most certainly | disagree with. | | I'm just noticing a trend lately where the word "capitalism" is | being attacked on many fronts, and I personally find that | troublesome. | | Like Churchill said..."Capitalism is absolutely the worse system | there is, besides everything else of course" | pyrale wrote: | I don't really understand why you would assume that the only | alternative to surveillance capitalism is "surveillance | socialism". | | This sophism seems to be built on two errors: | | * that there is nothing outside of pure capitalism and pure | socialism | | * that adjoining "surveillance" to capitalism means we're | talking about an inevitable aspect of our society that is | combined to capitalism, rather than a specific subset of the | way business is done in this age. | | To be honest, this lack of ability to conceive alternative | social systems is concerning. The deformed Churchill quote | comes as a cherry on top. | cubano wrote: | Of course there are an infinite number of alternatives...I | just quickly picked something that sounded decent that tried | to make my point about my observation lately that Capitalism | was getting knocked around everywhere I looked. | | In the last Econtalk podcast that I listened to last night, | they discussed the loneliness "epidemic" and the author ended | up blaming Thatcher-based Capitalism as perhaps the main | reason why people are so lonely today! | | I thought that was quite the stretch but imagine my chagrin | when here was another spurious back-handed attack on it. | pyrale wrote: | > Capitalism [is] getting knocked around everywhere I | looked. | | That's what happened with every social system in the past, | and, while failures certainly have happened, we've always | found ways for that criticism to result in improvements to | our societies. | | It would be very surprising for the current shape of our | society or, more generally, capitalism to be an exception | to the rule, unless you subscribe to the "end of history" | thesis. | cweagans wrote: | I can't tell if you're being willfully obtuse or not. | Harvesting people's data for the express purpose of | manipulating them into thinking/buying things that they | otherwise wouldn't is wrong in every sense of the word. | | Capitalism has its problems just like everything else. | Pretending it doesn't is just as disingenuous as pretending | that socialism would fix everything. If you're concerned about | people attacking capitalism, help fix the problems. Simple as | that. | dreamcompiler wrote: | Churchill was referring to democracy, not capitalism. | cubano wrote: | I'm interested in why this comment is getting so knocked down? | | I asked what I think is a valid question and would like to hear | honest reactions from people about my observation. | | I feel I have every right to be a cheerleader for Capitalism as | my father escaped communist Cuba in 1959 as Castro was coming | to power and used the US's system and tons of hard work to | create a extremely comfortable life for himself, while friends | and family members who stayed there lived rather wretched | lives. | | He never forgot how lucky he was to be able to get out of there | just in time and told me time and time again that the US while | having flaws, was by far the best place in the world to live, | so my original comment comes from this background. | | I don't give a flying fuck if the comment gets modded down, but | i would like to know just what in it is so offensive to those | modding it donw so I can learn something. | michaelt wrote: | Presumably the "capitalism" in "surveillance capitalism" is to | make it clear they're talking about _private companies_ - as | distinct from the traditional concerns about _government_ | surveillance. | DantesKite wrote: | I like Google, but wouldn't mind a better search engine, even at | the cost of my privacy, so long as I had a choice for what could | be shared. | ssss11 wrote: | Can you explain what you mean by this? I've read it a few times | and don't understand | DantesKite wrote: | Sure. I probably could've been much more clearer. | | I don't think Google taking your information and sharing it | with advertisers is a great sin. Somewhat annoying but | nothing particularly harmful. | | I do think the search results are easily manipulated and it | can be frustrating trying to find relevant information. Like | most people I end up defaulting to Reddit for search queries | just to find something that isn't a blog by someone shilling | their product. | | But I understand the invasion of privacy would irritate some | people and maybe in the long term it would be a net negative. | So if there was a search engine that explicitly asked for | certain information and you had the option to share, that | would probably go a long ways. | [deleted] | entire-name wrote: | I think OP means they don't mind sharing their information | with the search engine (be it Google, another engine that | provides better results, or even a better Google in terms of | results), _as long as_ OP has control over exactly what is | being shared. | | As an aside, I do see the trend for some companies to provide | this control nowadays. Even Google is doing it (e.g. you can | auto delete your information, or turn them off completely): | https://myaccount.google.com/data-and-privacy | | Of course, whether or not you believe Google is doing what | you have configured in the backend is another question... and | there is nothing anyone can do to actually make you believe | it short of giving you complete access to the entire Google | backend. Or is there a way to verify without exposing? Maybe | an interesting research topic... | ignoramous wrote: | You're in luck, since there's active development in this space: | Neeva.com and kagi.com two of the many alt search engines. | avipars wrote: | seems like it would hurt your traffic | amelius wrote: | Just join a web ring like in the old days. | | https://en.wikipedia.org/wiki/Webring | [deleted] | freediver wrote: | Only if there is relevant traffic from Google to begin with, | which is highly unlikely for a site like this. A high | percentage of results in almost every Google search comes from | the closed circle of the same top 10,000 sites or so. | | This is the beauty of a protest like this, because this site | does have valuable content, and if enough sites like this | joined the protest it could actually hurt the relevancy of the | Google index, that by the time Google figures out is valuable, | would not be allowed to index anymore. | jefftk wrote: | I don't think that's so unlikely: on my blog ~30% of visitors | come from searches | freediver wrote: | Sorry my statement was both generalized and specific at the | same time, and that did not turn out well. How many visits | does your blog have daily? And what would it take you to | remove your site from Google index? | jefftk wrote: | _> How many visits does your blog have daily?_ | | ~200k sessions in the past year, so ~550/d. Breakdown: | | * ~30% search | | * ~30% no referer | | * ~25% HN | | * ~7% Twitter/FB/etc | | * ~8% other | | _> what would it take you to remove your site from | Google index?_ | | I don't see why I would want to exclude my site from any | index? Being in indexes helps people find my writing, | which I like! | indigochill wrote: | > I don't see why I would want to exclude my site from | any index? Being in indexes helps people find my writing, | which I like! | | It's essentially a form of boycott. If one believes | Google is a problematic entity (too many fingers in too | many aspects of our lives), it's a way to sever | connections with them at some personal cost. | | At least, if you care about search traffic - one might | argue the assumption that Google-like search is the | default way to navigate the web is one worth | reconsidering and encouraging alternatives to anyway. | onion2k wrote: | That depends on where your traffic originates from. Back when I | tracked people on my site, I found I got very little from | search results. Most of it (> 95%) came from links from social | media and Github. On a blog that's heavily about privacy I | wouldn't expect much to come from Google. | | Also, so what if the numbers go down? If your reason from | writing a blog is to see a number on a screen then what does | that actually give you? | jefftk wrote: | _> so what if the numbers go down? If your reason from | writing a blog is to see a number on a screen then what does | that actually give you?_ | | Traffic numbers are not an end in themselves, but are a | decent proxy for "are other people getting value out of what | I write?" | lucb1e wrote: | If that's your goal. Personally I host content that people can | use or not. I'll link friends if I want them to see it. | Visitors don't cost me anything, it doesn't really bring me | anything (other than ego?) to have visitors either. Hence I saw | fit to also block google (two years ago already apparently, I | thought it was much more recent) and it didn't negatively | impact my site in any way. | floatingatoll wrote: | This assumes that the increase in traffic due to Google is | beneficial, which it rarely is for personal diary sites. | JimWestergren wrote: | Imagine wikipedia and the top newspapers doing this ... users | will start to use another search engine. | 0x073 wrote: | Normal users will start to use another newspaper | antattack wrote: | Wikipedia should do this as Apple and Google are showing | Wikipedia results as their own, robbing, IMO, Wikipedia of | importance. Wikipedia is large enough that it should have their | own search engine, likely with more relevant results. | nicce wrote: | Identical situation when Facebook was asked to not show | previews of the news articles, because of the ePrivacy | directive. Could this go for the same legistlation? | | https://www.mysk.blog/2021/02/08/fb-link-previews/ | joshuaissac wrote: | It does not help Wikipedia to do that. The content on | Wikipedia is licensed so that Apple and Google can show the | content from Wikipedia (and this is by design, not a | loophole). If the users can get access to the encyclopedic | content more conveniently, that is still in line with the | project's goals, even if that content reaches the user | indirectly via a third party. | breakingcups wrote: | No, people will start to read other news sites. | larrymcp wrote: | The first sentence grabbed my attention, and I was looking | forward to learning about the "threat that surveillance | capitalism poses to democracy and human autonomy". But then the | article fell flat: he gave no examples of that threat, and | neither did the linked article in The Guardian. | | Are there specific examples of this type of harm? The only | complaints that he made were that Google makes a lot of money | (which I have no problem with), and that Google's conduct feels | "creepy" to him (which is merely an emotional reaction). | | He did hint at Google "modifying your off-screen behavior", and I | was eager to learn about that as well... but then he left that | unexplored too, and gave no follow-up or examples of that | intriguing scenario. | rkarmani wrote: | He referenced this page: https://www.socialcooling.com/ ___________________________________________________________________ (page generated 2021-10-03 23:00 UTC)