[HN Gopher] Distributed search engines using BitTorrent and SQLite
       ___________________________________________________________________
        
       Distributed search engines using BitTorrent and SQLite
        
       Author : tosh
       Score  : 109 points
       Date   : 2021-01-20 18:40 UTC (4 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | r32a_ wrote:
       | I'd checkout http://dazaar.com/ as well. Same ideas but built on
       | Hypercore technology and with payments module built in
        
       | asymptosis wrote:
       | > This currently only works on Mac OS X.
       | 
       | That is already a sign that this project is not going to go
       | anywhere. Anyone who wants to build a server or make this part of
       | their seedbox is going to use Linux or one of the BSDs.
        
         | bigdict wrote:
         | macOS is POSIX certified. Is Linux?
        
         | bawolff wrote:
         | I think that is fine for a proof of concept.
         | 
         | The bigger reasonis, unless im missing something, this is not
         | distributed in the sense most people use the term "distributed"
         | in the context of search engines, so its not as interesting as
         | everyone is making it out to be.
        
       | lxe wrote:
       | > Site users then start downloading the site torrent, but, rather
       | than downloading pieces of the torrent in "rarest first" order,
       | they download pieces based on the search query they performed.
       | 
       | Interesting. How does the system know where the result of the
       | query might appear in the file?
        
         | frafra wrote:
         | Interesting question. I looked at the source code to understand
         | that.
         | 
         | SQLite knows where to look for when you open a SQLite database
         | and you run a query, right? It just asks the underlying
         | filesystem to provide N bytes starting from an offset using a C
         | function, then it repeats the same operation on different
         | portions of the file, it does its computation and everybody is
         | happy.
         | 
         | The software relies on sqltorrent, which is a custom VFS for
         | SQLite. That means that SQLite function to read data from a
         | file stored in the filesystem is replaced by a custom function.
         | Such custom code computes which Torrent block(s) should have
         | the highest priority, by dividing the offset and the number of
         | bytes that SQLite wants to read by the size of the torrent
         | blocks. It is just a division.
         | 
         | See:
         | https://github.com/bittorrent/sqltorrent/blob/master/sqltorr...
        
       | miki123211 wrote:
       | This is not as distributed as you might believe.
       | 
       | The content itself is distributed, which creates privacy
       | challenges of its own, but control over that content is
       | centralized. If we want automatic updates of the index, we're
       | still relying on a single party to provide them. That single
       | party might respond to DMCAs, remove/censor content etc.
        
       | jpereira wrote:
       | For work in a similar vein, Mikeal Rogers has recently been
       | working on IPSQL[0] based on peer-to-peer prdered search
       | indexes[1] built on IPFS, which shares the content-addressed
       | nature of BitTorrent.
       | 
       | [0]: https://github.com/mikeal/IPSQL
       | 
       | [1]: https://0fps.net/2020/12/19/peer-to-peer-ordered-search-
       | inde...
        
         | adkadskhj wrote:
         | With respect to IPFS and Merkle Search Trees, can anyone "in
         | the know" comment on how they're materially different than
         | Probabilistic B-Trees as defined by Noms[1] and Dolt[2]? I've
         | been playing a lot with the Noms variant (Prolly Trees) lately
         | and have often wondered where they differ from IPFS-ish Merkle
         | Search Trees. If at all.
         | 
         | [1]: https://github.com/attic-
         | labs/noms/blob/master/doc/intro.md#... [2]:
         | https://www.dolthub.com/blog/2020-04-01-how-dolt-stores-tabl...
        
       | bawolff wrote:
       | I don't think this meets most people's definition of a
       | distributed search engine.
        
       | tanelpoder wrote:
       | Since SQLite executing SQL locally on a remote peer machine is
       | essentially computation push-down, one could think of building a
       | planet-scale distributed analytics engine using such a pattern
       | (perhaps using DuckDB and parquet/arrow files under the hood -
       | but which exact SQL engine is behind the query pushdown API can
       | be abstracted away too)
       | 
       | edit: attracted->abstracted
        
       ___________________________________________________________________
       (page generated 2021-01-20 23:00 UTC)