[HN Gopher] YaCy - your own search engine ___________________________________________________________________ YaCy - your own search engine Author : modinfo Score : 163 points Date : 2022-08-25 17:47 UTC (5 hours ago) (HTM) web link (yacy.net) (TXT) w3m dump (yacy.net) | rasulkireev wrote: | Recently installed YaCy on my Synology via docker image the | provide. Already saved about 10Gb of content interesting to me. | Now, I have a personal Search Engine. Awesome. | BaseballPhysics wrote: | So what's your workflow for using it? You mentioned it's saved | "content interesting to me". Are you doing directed crawls | or...? | rasulkireev wrote: | Yeah, if it is just one articles or a blog post I crawl at | depth 0, and if it is someone's personal website who I enjoy | reading always, no matter what they write, I do an infinite | crawl on that specific domain. | Tijdreiziger wrote: | Off-topic, but how do you like Synology? I'm familiar with one | of their units for work, but I'm looking into a new NAS for my | home, and I'm trying to decide between Synology or building my | own and putting Nextcloud on it. | justsomehnguy wrote: | Grearly depends on what you are expecting from it. | | After $300 per unit S. has only two advantages: | | 1. Form-factor: you can build a comparable small enough unit | from OTC/OTS parts but usually it costs at least $200 more | | 2. Basic functionality (ie filesharing eg with SMB) just | works, with a nice webgui to configure it. | | If you need something more... | Tijdreiziger wrote: | Expectations: file/photo sync, media server, ad blocking | (Pi-hole). I saw that Synology has first-party apps for | most of this (Synology Drive, Moments, Video). | rasulkireev wrote: | Love it, have 0 complaints! I got DS220+ | chrisweekly wrote: | Happy w my DS-220+ too | wccrawford wrote: | Also not OP. I've got a Synology 918+ that I've used for | years, and as a file store, I'm quite pleased. | | I've tried running apps on it, and the ones that are | available are decent, but I pretty quickly got to where I | needed to SSH in to make certain things happen, and that felt | weird for an appliance like this. I added Docker and ran a | bunch of stuff on that, and that was kind of a pain. They | don't make it easy to update the images and the community's | solution is to SSH in and install watchtower to do it. | | I'm now just using it for network file storage and running | all those services on a Linux box instead. | | I thought about just putting the drives in the Linux box, but | I did some network testing and the NAS was faster, and it | provides a lot of storage-related niceties, so I'm keeping it | in the mix. For instance, I recently decided to upgrade the | drives to faster, larger ones, and it's been pretty easy. | Tijdreiziger wrote: | Thanks! So are you running the first-party Synology Drive, | Moments, etc. for file/photo syncing, or do you run | something like Nextcloud on your Linux box? Or do you not | use software like that? | usefulcat wrote: | I used a small Synology NAS from 2012-2019, at which point I | replaced it with small linux box because I wanted ZFS. | Inability to support ZFS was really the only reason I | replaced it; it was still working fine. | Tijdreiziger wrote: | What software are you running, and how much time do you | spend on maintenance? | usefulcat wrote: | Vanilla Ubuntu 18.04 LTS. Every couple of months or so I | update all the packages and reboot. That's really all the | maintenance I've ever done on it, apart from initial | setup. I ought to set it up so that it can email me if a | zfs scrub ever detects a problem, but I haven't done that | yet. | Tijdreiziger wrote: | Thanks! That's a valuable data point for my comparison. | | By the way, do you run software like Nextcloud, or are | you just using it as a storage tank? | rpdillon wrote: | Not OP, but I've been using a Synology NAS since 2013 and | it's a great product. I bought a router from them as well, | which is also superb. I think it's a fabulous investment. | sciguy77 wrote: | Has anyone tried LinkAce? I'd love to hear someone's thoughts on | YaCy vs LinkAce. | | This is great timing. After looking at YaCy for my Synology NAS a | few week ago, I looked at some alternatives. I like the look of | LinkAce, though it seems to be less popular and I haven't found | much on how a setup on a Synology NAS works. | | I'd love some advice, I have a massive number of bookmarks across | dozens of folders. Something like this is exactly what I'm | looking for. | rasulkireev wrote: | I did that a couple of months ago. Was planning to write | something up in the next month or so. | encryptluks2 wrote: | They serve very different purposes. While a search engine in | turn can archives sites it isn't the only purpose. LinkAce is | designed more for bookmarking and archiving sites akin to a | bookmark manager, not as a search engine. | AndyMcConachie wrote: | I have about 100,000 PDFs that I want indexed and searchable. | They're on a website and I want people to be able to visit the | website and search through the PDFs. | | Should I use Yacy or Apache Solr? | | All opinions and rants welcome. | dang wrote: | Related: | | _YaCy: Decentralized Web Search_ - | https://news.ycombinator.com/item?id=22246732 - Feb 2020 (41 | comments) | | _YaCy: a free distributed search engine_ - | https://news.ycombinator.com/item?id=12433010 - Sept 2016 (24 | comments) | | _YaCy - Peer to Peer Search Engine_ - | https://news.ycombinator.com/item?id=11956268 - June 2016 (3 | comments) | | _YaCy: Decentralized Web Search_ - | https://news.ycombinator.com/item?id=8746883 - Dec 2014 (29 | comments) | | _YaCy takes on Google with open source search engine_ - | https://news.ycombinator.com/item?id=3288586 - Nov 2011 (17 | comments) | a5huynh wrote: | Shameless self-plug, I've been building some similar that you can | run locally as an app: https://github.com/a5huynh/spyglass | | You can define some basic rules & it'll go out and crawl those | particular sites. Or use one that someone else has built. It can | also sync with your Chrome/Firefox bookmarks. Would love feedback | from folks who get a chance to use it ! | bobajeff wrote: | I would like to use this. However, in the past when I've tried it | I didn't like the results. It would be nice to hear about more | competition in the P2P information retrieval (search engine) tech | space. YaCy seems to be the only one I've consistently heard | about over the years. | pacifika wrote: | Use this as a personal knowledge base. Indexed my blog. Indexed a | bookmarks export. Indexed a knowledge base. Works well. It also | convinced me of power user ui | gavmor wrote: | That sounds promising! How often do you export your bookmarks, | and in what format do you keep your knowledge base? | tecoholic wrote: | Self plug - If you want to skip bookmarking and go straight to | indexing, I have a firefox extension for it - | https://github.com/tecoholic/yacy-it | ThinkingGuy wrote: | I keep everything on my home server: photos, music, home | videos, movies, downloaded webpages, ebooks, instruction | manuals, etc., all shared out over HTTP. Yacy basically gives | me a centralized, private search engine for my house. Example | searches: "Frigidaire manual" "living room collection:Photos" | "London Philharmonic Orchestra collection:Music" | | Of course, having things in an organized hierarchical file | system, with good metadata, helps. | pacifika wrote: | Firefox export as html then point yacy to it. My knowledge base | is a bookstack instance | mtlynch wrote: | I love the idea of this, but I tried to spin up my own instance | and was immediately overwhelmed by the million little knobs and | settings for it. | | It seems like a lot of fun if you understand all the tuning, but | I feel like the current state alienates most users who want to | use it in simple scenarios. | 6510 wrote: | Default settings works well enough but I agree 90% should be | hidden behind an advanced settings check box. (I suspect the | organization of features is more obvious in German.) There are | also lots of other cool things one can do that are not in the | interface but arguably should be. | | That said, for what it is it is pretty epic already. As a proof | of concept it's completely convincing. | bityard wrote: | There are lots of settings because it's very powerful software. | I don't understand the part about being overwhelmed... surely | the developers have chosen sane defaults for most things and | you can just ignore the ones you don't understand? | mtlynch wrote: | That wasn't my experience. YaCy didn't do what I wanted out | of the box, so I was just left with 100+ settings that I | didn't know how to adjust to get to a desired state. | bityard wrote: | It's interesting that this uses a distributed P2P index. That's a | very good idea and one of the things that has held me back from | even thinking about trying to build my own tech-focused search | engine. | | One thing I was hoping to see in the FAQ was how they prevent | rogue nodes from inserting spam or other kinds of mischief into | the public index. | viraptor wrote: | They don't really. You have to apply your own filtering. | alxjsn wrote: | If you haven't heard of Brave Goggles | (https://github.com/brave/goggles-quickstart) I highly recommend | checking it out. Just being able to create the search index is a | massive task, so being able to apply rules server-side to their | "expanded recall set" will give you what most people building | search engines want, which is to control the algorithm. We | weren't able to do that until now since applying rules client- | side doesn't work well on a small search result set. | | Related: I created a tool to create Goggles using subreddits as a | signal source for domains: | https://github.com/forcesunseen/narwhalizer | upupandup wrote: | I see Brave. I close tab. I don't trust them or anybody that | pushes their offerings which are just crypto ponzi schemes. | hunterb123 wrote: | The crypto stuff is disabled by default, get a new talking | point. | upupandup wrote: | a deliberate ponzi enabling mechanism shouldn't even be | available | hunterb123 wrote: | k | 867-5309 wrote: | at least they put the safety on before throwing you the | gun | UberFly wrote: | It's just a different revenue model than the usual ad | garbage. You don't have to use it. | metalliqaz wrote: | I thought Brave was just a web browser with built-in adblock, | but after your comment I decided to look it up on wikipedia. | Holey moley, what a nightmare. | mimimi31 wrote: | Kagi (https://kagi.com) has very similar tools with their | "Lenses" and customizable prioritization of specific domains. | rtev wrote: | Kagi actually did it first, I think. Too bad everyone only | knows about it via Brave, Kagi is an awesome search engine | scrollaway wrote: | Seconding, Kagi is great. I hope they succeed... | Entinel wrote: | Kagi is a weird beast. I'd like to use it but I also don't | understand how searches are private if I have to login. Not | understanding that is definitely on me but I feel like it | should be a frequent enough question that they try to make | the answer obvious. | skybrian wrote: | Seems like you're burying the lead a bit since your "Basic | Usage" involves running some Docker instance for some reason | and you don't need to do that just to try it out? | | It looks like Goggles are just text files hosted on GitHub or | GitLab and you can try them out with Brave's search engine | without installing anything. Some to try: | | https://search.brave.com/goggles/discover | | The netsec Goggle is here: | | https://search.brave.com/goggles?goggles_id=https://github.c... | 10g1k wrote: | Copernic used to be a great way to do this. Register every search | engine you like in the local software, apply rules, search all | the web search engines at once. Until they went 100% corporate, | it was awesome. ___________________________________________________________________ (page generated 2022-08-25 23:00 UTC)