2024-06-22 Lieu for niche search
================================

“Let me google it for you” was a bad reply back in the days; these
days telling somebody “to google something” is insulting because we
all know how aggravating it is to spend time and energy against
privacy invasions and sifting through the AI slop and the grifters.
Life is short and there’s no time for this.

I try to err on the side of caution and always add links to stuff if I
can. I’m preparing for the total breakdown of search as we know it.

Some days Internet search feels like we’re at the capitalist optimum
where everything is as painful as possible but not prohibitive. Can’t
live without it (to find those link farms, advice sites, public
archives) but also can’t find the private websites and blogs.

Is it true that Reddit managed to replace all the blogs or is it true
that we just can’t find them any more. The real problem is the loss of
trust all around.

@vesto@merveilles.town reminded me that there was a solution in "the
realm of curating human-scale websites": Lieu.

And indeed, it looks like a great starting place!

I'm thinking that perhaps I can somehow use blog planets as sources
for topic-specific search engines. I could use the blogs on Planet
Emacslife to build an Emacs search engine and the blogs of RPG Planet
 to build an RPG search engine.

It seems quite doable!

The ethics remain complicated, of course. Would I want search engine
developers to scrape my site? Usually I just notice them when their
bots misbehave so I'm pretty averse to the entire situation. The
search engine makers also being ad sellers and ads being a poison for
our society doesn't help. The search engine makers being AI fans and
AI energy and water requirements being cited as the reasons to keep
fossil power plants running and to build nuclear power plants doesn't
help. AI also resulting in me having to read texts that other people
didn't bother to write adds injury to insult.

In any case… perhaps there is a way to have ethical search and not
drown in the AI slob: Hand curated websites (taken from the planets),
no ads, no income (and therefore necessarily small in scope).

I started experimenting. Using webringSelector = "#sidebar li
a:nth-child(2) [href]" in the lieu.toml file seemes to get the links
of RPG blogs, for example.

One thing I find disturbing is that lieu uses Colly to scrape the web
and Colly ignores robots.txt by default. I managed to lock the
crawler out of my site in less than a second. Yikes!

And the sad part is that if I add c.IgnoreRobotsTxt = false to the
code, it seems to have no effect. Does anybody know more about how to
control this?

​#Search

The following branches are probably temporary:

unix-domain-socket: This can be used by systemd to start the server
connected to a Unix domain socket. Using a server like nginx or Apache
as a front-end allows them to communicate with the server via the Unix
domain socket. In that case, no port is required.

robots-txt: This is where I try to enable robots.txt handling again
– but I'm failing at it.

Then again, perhaps the ethics of it all make it untenable for me – is
this really something I need? If it isn’t, perhaps I’d be better off
doing something else.

2024-06-26. Perhaps it'd be easier to search feeds. People already
publish feeds. The feeds already contain the content they are willing
to share. The feed is already limited to web pages and doesn't include
web applications.