2024-06-22 Lieu for niche search ================================ “Let me google it for you” was a bad reply back in the days; these days telling somebody “to google something” is insulting because we all know how aggravating it is to spend time and energy against privacy invasions and sifting through the AI slop and the grifters. Life is short and there’s no time for this. I try to err on the side of caution and always add links to stuff if I can. I’m preparing for the total breakdown of search as we know it. Some days Internet search feels like we’re at the capitalist optimum where everything is as painful as possible but not prohibitive. Can’t live without it (to find those link farms, advice sites, public archives) but also can’t find the private websites and blogs. Is it true that Reddit managed to replace all the blogs or is it true that we just can’t find them any more. The real problem is the loss of trust all around. @vesto@merveilles.town reminded me that there was a solution in "the realm of curating human-scale websites": Lieu. And indeed, it looks like a great starting place! I'm thinking that perhaps I can somehow use blog planets as sources for topic-specific search engines. I could use the blogs on Planet Emacslife to build an Emacs search engine and the blogs of RPG Planet to build an RPG search engine. It seems quite doable! The ethics remain complicated, of course. Would I want search engine developers to scrape my site? Usually I just notice them when their bots misbehave so I'm pretty averse to the entire situation. The search engine makers also being ad sellers and ads being a poison for our society doesn't help. The search engine makers being AI fans and AI energy and water requirements being cited as the reasons to keep fossil power plants running and to build nuclear power plants doesn't help. AI also resulting in me having to read texts that other people didn't bother to write adds injury to insult. In any case… perhaps there is a way to have ethical search and not drown in the AI slob: Hand curated websites (taken from the planets), no ads, no income (and therefore necessarily small in scope). I started experimenting. Using webringSelector = "#sidebar li a:nth-child(2) [href]" in the lieu.toml file seemes to get the links of RPG blogs, for example. One thing I find disturbing is that lieu uses Colly to scrape the web and Colly ignores robots.txt by default. I managed to lock the crawler out of my site in less than a second. Yikes! And the sad part is that if I add c.IgnoreRobotsTxt = false to the code, it seems to have no effect. Does anybody know more about how to control this? ​#Search The following branches are probably temporary: unix-domain-socket: This can be used by systemd to start the server connected to a Unix domain socket. Using a server like nginx or Apache as a front-end allows them to communicate with the server via the Unix domain socket. In that case, no port is required. robots-txt: This is where I try to enable robots.txt handling again – but I'm failing at it. Then again, perhaps the ethics of it all make it untenable for me – is this really something I need? If it isn’t, perhaps I’d be better off doing something else. 2024-06-26. Perhaps it'd be easier to search feeds. People already publish feeds. The feeds already contain the content they are willing to share. The feed is already limited to web pages and doesn't include web applications.