[HN Gopher] Distributed search engines using BitTorrent and SQLite ___________________________________________________________________ Distributed search engines using BitTorrent and SQLite Author : tosh Score : 109 points Date : 2021-01-20 18:40 UTC (4 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | r32a_ wrote: | I'd checkout http://dazaar.com/ as well. Same ideas but built on | Hypercore technology and with payments module built in | asymptosis wrote: | > This currently only works on Mac OS X. | | That is already a sign that this project is not going to go | anywhere. Anyone who wants to build a server or make this part of | their seedbox is going to use Linux or one of the BSDs. | bigdict wrote: | macOS is POSIX certified. Is Linux? | bawolff wrote: | I think that is fine for a proof of concept. | | The bigger reasonis, unless im missing something, this is not | distributed in the sense most people use the term "distributed" | in the context of search engines, so its not as interesting as | everyone is making it out to be. | lxe wrote: | > Site users then start downloading the site torrent, but, rather | than downloading pieces of the torrent in "rarest first" order, | they download pieces based on the search query they performed. | | Interesting. How does the system know where the result of the | query might appear in the file? | frafra wrote: | Interesting question. I looked at the source code to understand | that. | | SQLite knows where to look for when you open a SQLite database | and you run a query, right? It just asks the underlying | filesystem to provide N bytes starting from an offset using a C | function, then it repeats the same operation on different | portions of the file, it does its computation and everybody is | happy. | | The software relies on sqltorrent, which is a custom VFS for | SQLite. That means that SQLite function to read data from a | file stored in the filesystem is replaced by a custom function. | Such custom code computes which Torrent block(s) should have | the highest priority, by dividing the offset and the number of | bytes that SQLite wants to read by the size of the torrent | blocks. It is just a division. | | See: | https://github.com/bittorrent/sqltorrent/blob/master/sqltorr... | miki123211 wrote: | This is not as distributed as you might believe. | | The content itself is distributed, which creates privacy | challenges of its own, but control over that content is | centralized. If we want automatic updates of the index, we're | still relying on a single party to provide them. That single | party might respond to DMCAs, remove/censor content etc. | jpereira wrote: | For work in a similar vein, Mikeal Rogers has recently been | working on IPSQL[0] based on peer-to-peer prdered search | indexes[1] built on IPFS, which shares the content-addressed | nature of BitTorrent. | | [0]: https://github.com/mikeal/IPSQL | | [1]: https://0fps.net/2020/12/19/peer-to-peer-ordered-search- | inde... | adkadskhj wrote: | With respect to IPFS and Merkle Search Trees, can anyone "in | the know" comment on how they're materially different than | Probabilistic B-Trees as defined by Noms[1] and Dolt[2]? I've | been playing a lot with the Noms variant (Prolly Trees) lately | and have often wondered where they differ from IPFS-ish Merkle | Search Trees. If at all. | | [1]: https://github.com/attic- | labs/noms/blob/master/doc/intro.md#... [2]: | https://www.dolthub.com/blog/2020-04-01-how-dolt-stores-tabl... | bawolff wrote: | I don't think this meets most people's definition of a | distributed search engine. | tanelpoder wrote: | Since SQLite executing SQL locally on a remote peer machine is | essentially computation push-down, one could think of building a | planet-scale distributed analytics engine using such a pattern | (perhaps using DuckDB and parquet/arrow files under the hood - | but which exact SQL engine is behind the query pushdown API can | be abstracted away too) | | edit: attracted->abstracted ___________________________________________________________________ (page generated 2021-01-20 23:00 UTC)