[HN Gopher] YaCy - your own search engine
       ___________________________________________________________________
        
       YaCy - your own search engine
        
       Author : modinfo
       Score  : 163 points
       Date   : 2022-08-25 17:47 UTC (5 hours ago)
        
 (HTM) web link (yacy.net)
 (TXT) w3m dump (yacy.net)
        
       | rasulkireev wrote:
       | Recently installed YaCy on my Synology via docker image the
       | provide. Already saved about 10Gb of content interesting to me.
       | Now, I have a personal Search Engine. Awesome.
        
         | BaseballPhysics wrote:
         | So what's your workflow for using it? You mentioned it's saved
         | "content interesting to me". Are you doing directed crawls
         | or...?
        
           | rasulkireev wrote:
           | Yeah, if it is just one articles or a blog post I crawl at
           | depth 0, and if it is someone's personal website who I enjoy
           | reading always, no matter what they write, I do an infinite
           | crawl on that specific domain.
        
         | Tijdreiziger wrote:
         | Off-topic, but how do you like Synology? I'm familiar with one
         | of their units for work, but I'm looking into a new NAS for my
         | home, and I'm trying to decide between Synology or building my
         | own and putting Nextcloud on it.
        
           | justsomehnguy wrote:
           | Grearly depends on what you are expecting from it.
           | 
           | After $300 per unit S. has only two advantages:
           | 
           | 1. Form-factor: you can build a comparable small enough unit
           | from OTC/OTS parts but usually it costs at least $200 more
           | 
           | 2. Basic functionality (ie filesharing eg with SMB) just
           | works, with a nice webgui to configure it.
           | 
           | If you need something more...
        
             | Tijdreiziger wrote:
             | Expectations: file/photo sync, media server, ad blocking
             | (Pi-hole). I saw that Synology has first-party apps for
             | most of this (Synology Drive, Moments, Video).
        
           | rasulkireev wrote:
           | Love it, have 0 complaints! I got DS220+
        
             | chrisweekly wrote:
             | Happy w my DS-220+ too
        
           | wccrawford wrote:
           | Also not OP. I've got a Synology 918+ that I've used for
           | years, and as a file store, I'm quite pleased.
           | 
           | I've tried running apps on it, and the ones that are
           | available are decent, but I pretty quickly got to where I
           | needed to SSH in to make certain things happen, and that felt
           | weird for an appliance like this. I added Docker and ran a
           | bunch of stuff on that, and that was kind of a pain. They
           | don't make it easy to update the images and the community's
           | solution is to SSH in and install watchtower to do it.
           | 
           | I'm now just using it for network file storage and running
           | all those services on a Linux box instead.
           | 
           | I thought about just putting the drives in the Linux box, but
           | I did some network testing and the NAS was faster, and it
           | provides a lot of storage-related niceties, so I'm keeping it
           | in the mix. For instance, I recently decided to upgrade the
           | drives to faster, larger ones, and it's been pretty easy.
        
             | Tijdreiziger wrote:
             | Thanks! So are you running the first-party Synology Drive,
             | Moments, etc. for file/photo syncing, or do you run
             | something like Nextcloud on your Linux box? Or do you not
             | use software like that?
        
           | usefulcat wrote:
           | I used a small Synology NAS from 2012-2019, at which point I
           | replaced it with small linux box because I wanted ZFS.
           | Inability to support ZFS was really the only reason I
           | replaced it; it was still working fine.
        
             | Tijdreiziger wrote:
             | What software are you running, and how much time do you
             | spend on maintenance?
        
               | usefulcat wrote:
               | Vanilla Ubuntu 18.04 LTS. Every couple of months or so I
               | update all the packages and reboot. That's really all the
               | maintenance I've ever done on it, apart from initial
               | setup. I ought to set it up so that it can email me if a
               | zfs scrub ever detects a problem, but I haven't done that
               | yet.
        
               | Tijdreiziger wrote:
               | Thanks! That's a valuable data point for my comparison.
               | 
               | By the way, do you run software like Nextcloud, or are
               | you just using it as a storage tank?
        
           | rpdillon wrote:
           | Not OP, but I've been using a Synology NAS since 2013 and
           | it's a great product. I bought a router from them as well,
           | which is also superb. I think it's a fabulous investment.
        
       | sciguy77 wrote:
       | Has anyone tried LinkAce? I'd love to hear someone's thoughts on
       | YaCy vs LinkAce.
       | 
       | This is great timing. After looking at YaCy for my Synology NAS a
       | few week ago, I looked at some alternatives. I like the look of
       | LinkAce, though it seems to be less popular and I haven't found
       | much on how a setup on a Synology NAS works.
       | 
       | I'd love some advice, I have a massive number of bookmarks across
       | dozens of folders. Something like this is exactly what I'm
       | looking for.
        
         | rasulkireev wrote:
         | I did that a couple of months ago. Was planning to write
         | something up in the next month or so.
        
         | encryptluks2 wrote:
         | They serve very different purposes. While a search engine in
         | turn can archives sites it isn't the only purpose. LinkAce is
         | designed more for bookmarking and archiving sites akin to a
         | bookmark manager, not as a search engine.
        
       | AndyMcConachie wrote:
       | I have about 100,000 PDFs that I want indexed and searchable.
       | They're on a website and I want people to be able to visit the
       | website and search through the PDFs.
       | 
       | Should I use Yacy or Apache Solr?
       | 
       | All opinions and rants welcome.
        
       | dang wrote:
       | Related:
       | 
       |  _YaCy: Decentralized Web Search_ -
       | https://news.ycombinator.com/item?id=22246732 - Feb 2020 (41
       | comments)
       | 
       |  _YaCy: a free distributed search engine_ -
       | https://news.ycombinator.com/item?id=12433010 - Sept 2016 (24
       | comments)
       | 
       |  _YaCy - Peer to Peer Search Engine_ -
       | https://news.ycombinator.com/item?id=11956268 - June 2016 (3
       | comments)
       | 
       |  _YaCy: Decentralized Web Search_ -
       | https://news.ycombinator.com/item?id=8746883 - Dec 2014 (29
       | comments)
       | 
       |  _YaCy takes on Google with open source search engine_ -
       | https://news.ycombinator.com/item?id=3288586 - Nov 2011 (17
       | comments)
        
       | a5huynh wrote:
       | Shameless self-plug, I've been building some similar that you can
       | run locally as an app: https://github.com/a5huynh/spyglass
       | 
       | You can define some basic rules & it'll go out and crawl those
       | particular sites. Or use one that someone else has built. It can
       | also sync with your Chrome/Firefox bookmarks. Would love feedback
       | from folks who get a chance to use it !
        
       | bobajeff wrote:
       | I would like to use this. However, in the past when I've tried it
       | I didn't like the results. It would be nice to hear about more
       | competition in the P2P information retrieval (search engine) tech
       | space. YaCy seems to be the only one I've consistently heard
       | about over the years.
        
       | pacifika wrote:
       | Use this as a personal knowledge base. Indexed my blog. Indexed a
       | bookmarks export. Indexed a knowledge base. Works well. It also
       | convinced me of power user ui
        
         | gavmor wrote:
         | That sounds promising! How often do you export your bookmarks,
         | and in what format do you keep your knowledge base?
        
         | tecoholic wrote:
         | Self plug - If you want to skip bookmarking and go straight to
         | indexing, I have a firefox extension for it -
         | https://github.com/tecoholic/yacy-it
        
         | ThinkingGuy wrote:
         | I keep everything on my home server: photos, music, home
         | videos, movies, downloaded webpages, ebooks, instruction
         | manuals, etc., all shared out over HTTP. Yacy basically gives
         | me a centralized, private search engine for my house. Example
         | searches: "Frigidaire manual" "living room collection:Photos"
         | "London Philharmonic Orchestra collection:Music"
         | 
         | Of course, having things in an organized hierarchical file
         | system, with good metadata, helps.
        
         | pacifika wrote:
         | Firefox export as html then point yacy to it. My knowledge base
         | is a bookstack instance
        
       | mtlynch wrote:
       | I love the idea of this, but I tried to spin up my own instance
       | and was immediately overwhelmed by the million little knobs and
       | settings for it.
       | 
       | It seems like a lot of fun if you understand all the tuning, but
       | I feel like the current state alienates most users who want to
       | use it in simple scenarios.
        
         | 6510 wrote:
         | Default settings works well enough but I agree 90% should be
         | hidden behind an advanced settings check box. (I suspect the
         | organization of features is more obvious in German.) There are
         | also lots of other cool things one can do that are not in the
         | interface but arguably should be.
         | 
         | That said, for what it is it is pretty epic already. As a proof
         | of concept it's completely convincing.
        
         | bityard wrote:
         | There are lots of settings because it's very powerful software.
         | I don't understand the part about being overwhelmed... surely
         | the developers have chosen sane defaults for most things and
         | you can just ignore the ones you don't understand?
        
           | mtlynch wrote:
           | That wasn't my experience. YaCy didn't do what I wanted out
           | of the box, so I was just left with 100+ settings that I
           | didn't know how to adjust to get to a desired state.
        
       | bityard wrote:
       | It's interesting that this uses a distributed P2P index. That's a
       | very good idea and one of the things that has held me back from
       | even thinking about trying to build my own tech-focused search
       | engine.
       | 
       | One thing I was hoping to see in the FAQ was how they prevent
       | rogue nodes from inserting spam or other kinds of mischief into
       | the public index.
        
         | viraptor wrote:
         | They don't really. You have to apply your own filtering.
        
       | alxjsn wrote:
       | If you haven't heard of Brave Goggles
       | (https://github.com/brave/goggles-quickstart) I highly recommend
       | checking it out. Just being able to create the search index is a
       | massive task, so being able to apply rules server-side to their
       | "expanded recall set" will give you what most people building
       | search engines want, which is to control the algorithm. We
       | weren't able to do that until now since applying rules client-
       | side doesn't work well on a small search result set.
       | 
       | Related: I created a tool to create Goggles using subreddits as a
       | signal source for domains:
       | https://github.com/forcesunseen/narwhalizer
        
         | upupandup wrote:
         | I see Brave. I close tab. I don't trust them or anybody that
         | pushes their offerings which are just crypto ponzi schemes.
        
           | hunterb123 wrote:
           | The crypto stuff is disabled by default, get a new talking
           | point.
        
             | upupandup wrote:
             | a deliberate ponzi enabling mechanism shouldn't even be
             | available
        
               | hunterb123 wrote:
               | k
        
               | 867-5309 wrote:
               | at least they put the safety on before throwing you the
               | gun
        
           | UberFly wrote:
           | It's just a different revenue model than the usual ad
           | garbage. You don't have to use it.
        
           | metalliqaz wrote:
           | I thought Brave was just a web browser with built-in adblock,
           | but after your comment I decided to look it up on wikipedia.
           | Holey moley, what a nightmare.
        
         | mimimi31 wrote:
         | Kagi (https://kagi.com) has very similar tools with their
         | "Lenses" and customizable prioritization of specific domains.
        
           | rtev wrote:
           | Kagi actually did it first, I think. Too bad everyone only
           | knows about it via Brave, Kagi is an awesome search engine
        
             | scrollaway wrote:
             | Seconding, Kagi is great. I hope they succeed...
        
           | Entinel wrote:
           | Kagi is a weird beast. I'd like to use it but I also don't
           | understand how searches are private if I have to login. Not
           | understanding that is definitely on me but I feel like it
           | should be a frequent enough question that they try to make
           | the answer obvious.
        
         | skybrian wrote:
         | Seems like you're burying the lead a bit since your "Basic
         | Usage" involves running some Docker instance for some reason
         | and you don't need to do that just to try it out?
         | 
         | It looks like Goggles are just text files hosted on GitHub or
         | GitLab and you can try them out with Brave's search engine
         | without installing anything. Some to try:
         | 
         | https://search.brave.com/goggles/discover
         | 
         | The netsec Goggle is here:
         | 
         | https://search.brave.com/goggles?goggles_id=https://github.c...
        
       | 10g1k wrote:
       | Copernic used to be a great way to do this. Register every search
       | engine you like in the local software, apply rules, search all
       | the web search engines at once. Until they went 100% corporate,
       | it was awesome.
        
       ___________________________________________________________________
       (page generated 2022-08-25 23:00 UTC)