[HN Gopher] Hosting SQLite databases on GitHub Pages or any stat...
       ___________________________________________________________________
        
       Hosting SQLite databases on GitHub Pages or any static file hoster
        
       Author : phiresky
       Score  : 608 points
       Date   : 2021-05-02 16:43 UTC (6 hours ago)
        
 (HTM) web link (phiresky.github.io)
 (TXT) w3m dump (phiresky.github.io)
        
       | CyberRabbi wrote:
       | Over high latency links this would be virtually unusable. Why not
       | just download the entire database into memory over XHR on page
       | load? SQLite databases of pure data usually aren't over 10MB in
       | size.
        
         | petters wrote:
         | > SQLite databases of pure data usually aren't over 10MB in
         | size
         | 
         | Why do you think this?
        
           | [deleted]
        
         | mcculley wrote:
         | That is hilariously wrong for a lot of use cases. I will find
         | this very handy for some SQLite databases I have that are
         | several GBs in size. I am looking right now at using this
         | contribution.
        
           | CyberRabbi wrote:
           | It's not hilariously wrong or wrong at all that over high
           | latency links this would be virtually unusable.
           | 
           | It's certainly possible that people are using SQLite
           | databases with sizes on the order of gigabytes but in my
           | experience those are the exception not the rule.
        
             | tehbeard wrote:
             | Over high latency links most anything interesting is
             | virtually unusable, so stop using it as a high horse to
             | stand upon.
        
         | simonw wrote:
         | The demo here uses a 670MB database file.
        
         | formerly_proven wrote:
         | This particular demo page actually makes queries against an
         | almost 700 MB large (fills one CD!) SQLite database. Because
         | the amount of data read is almost negligible (few hundred kB),
         | performance is limited by latency (as you say). However, high-
         | latency links also tend to be slower, so downloading the entire
         | database a-priori would almost always be much slower.
         | 
         | For example, on a 1 megabit/s link with 300 ms RTT, one example
         | would take about 2 seconds for the data transfer itself while
         | spending another 3 seconds or so on waiting. Downloading the
         | entire file would take around _an hour and a half_.
         | 
         | For your 10 MB database, transferring it as a whole would take
         | 80 seconds. Assuming this solution instead needs to read e.g.
         | 250 kB (taking 2 seconds to transfer), it could still bounce
         | around 250 times to the database before those 10 MB are fully
         | downloaded. (This would be a really odd query, since it would
         | only read on average two pages per read request)
        
           | CyberRabbi wrote:
           | Right but that is an artificially created demo by the author
           | to justify the solution being presented (no offense). The
           | question is how common are ~GB large SQLite databases in the
           | real world relative to databases that are ~MB large?
           | 
           | In my experience SQLite databases of millions of rows of raw
           | tabular data tend to compress very well into dozens of
           | megabytes. Indeed SQLite is often touted as a file format for
           | applications.
        
             | dkarras wrote:
             | Even if you compress it, you need it in client memory which
             | can also be a dealbreaker. Some people might need it for
             | some projects, I know I did.
        
               | CyberRabbi wrote:
               | > you need it in client memory which can also be a
               | dealbreaker
               | 
               | Most workstations have GBs of available memory. If not
               | you can dump it in indexeddb as a raw data store.
               | 
               | I never disputed that it would be useful for some use
               | cases. I only said it would be unusable with high latency
               | links. If you have a low latency link and aren't running
               | very complex queries with lots of random seeks, then this
               | should work fine for you.
        
             | rakoo wrote:
             | I'm running magnetico
             | (https://github.com/boramalper/magnetico) on my VPS. I
             | currently have an index of 1.6M magnet links stored in a
             | 5GB database.
             | 
             | SQLite is most interesting not when the database is small,
             | but when there are very few writes and all you do is
             | reading. You can also look at https://datasette.io/ and see
             | how SQLite is perfect for representing a lot of datasets
             | and querying them
        
             | brandmeyer wrote:
             | My team has a few TB of data in SQLite files that are
             | themselves dozens of GB each.
             | 
             | We're using them as a replacement for leveldb's sstables,
             | but with the structure of full SQL. It is highly effective.
        
               | CyberRabbi wrote:
               | Do you think your team's usage of SQLite is
               | representative of the average SQLite user?
        
               | detaro wrote:
               | Where has it been suggested that this is the best
               | solution for "the average SQLite user", instead of a tool
               | you can use if it fits your requirements? To take your
               | 10MB number, the article _starts_ by mentioning you can
               | probably just download the entire thing if you aren 't
               | above that exact same number.
        
         | detaro wrote:
         | And the article _starts_ by mentioning that you can download
         | the entire file if it 's not too large. And then goes on to
         | present a solution for larger files. What more answer to "Why
         | not just download the entire database" do you expect?
        
       | bob1029 wrote:
       | Modeling the DOM in SQL... Further evidence that anything we can
       | imagine has some stable representation in third normal form. Is
       | it fast? Maybe not. But is it correct? Provably.
        
       | crazygringo wrote:
       | This is hilariously clever.
       | 
       | Using the "Range" HTTP header to _read_ chunks of the database
       | file absolutely works!
       | 
       | But to be clear, there's no _write_ equivalent, is there? You can
       | 't use "Range" with a PUT request.
        
         | TimTheTinker wrote:
         | TA was clear - there's no way to write, since static file
         | hosting doesn't support dynamic write to begin with.
         | 
         | However, I imagine a service to support your scenario could be
         | written in a standard back-end server language like Go or JS.
         | The challenges involved would be much greater, however -- how
         | to handle concurrency in particular. I suspect one would do
         | better to just run PostgreSQL behind a web API.
        
           | csomar wrote:
           | That's basically re-inventing the Database but on the client
           | side. We have gone a long way but we are closer to having the
           | server side just as a data store.
        
       | ianlevesque wrote:
       | This is easily the most clever web programming hack I've seen
       | this year. Bravo. I had seen this used for video or audio of
       | course but it never occurred to me you could use it for
       | databases. There are probably a ton of other formats this is good
       | for too.
        
       | sroussey wrote:
       | The innovation here is getting sql.js to use http and range
       | requests for file access rather than all being in memory.
       | 
       | I wonder when people using next.js will start using this for
       | faster builds for larger static sites?
        
         | jhgb wrote:
         | Microsoft Access Cloud Edition, basically?
        
         | tyingq wrote:
         | Would also be great to add (efficient) search to a static blog.
        
           | edoceo wrote:
           | yea, sqlite FTS5 has been pretty amazing for quick search
           | solutions (but I use english only)
        
         | misterdata wrote:
         | See also https://github.com/bittorrent/sqltorrent, same trick
         | but using BitTorrent
        
           | phiresky wrote:
           | Yeah, that was one of the inspirations for this. That one
           | does not work in the browser though, would be a good project
           | to do that same thing but with sqlite in wasm and integrated
           | with WebTorrent instead of a native torrent program.
           | 
           | I actually did also implement a similar thing fetching data
           | on demand from WebTorrent (and in turn helping to host the
           | data yourself by being on the website):
           | https://phiresky.github.io/tv-show-ratings/ That uses a
           | protobufs split into a hashmap instead of SQLite though.
        
             | westurner wrote:
             | This looks pretty efficient. Some chains can be interacted
             | with without e.g. web3.js? LevelDB indexes aren't SQLite.
             | 
             | Datasette is one application for views of _read-only_
             | SQLite dbs with out-of-band replication.
             | https://github.com/simonw/datasette
             | 
             | There are a bunch of *-to-sqlite utilities in corresponding
             | dogsheep project.
             | 
             | Arrow JS for 'paged' browser client access to DuckDB might
             | be possible and faster but without full SQLite SQL
             | compatibility and the SQLite test suite.
             | https://arrow.apache.org/docs/js/
             | 
             | https://duckdb.org/ :
             | 
             | > _Direct Parquet & CSV querying_
             | 
             | In-browser notebooks like Pyodide and Jyve have local
             | filesystem access with the new "Filesystem Access API", but
             | downloading/copying all data to the browser for every run
             | of a browser-hosted notebook may not be necessary.
             | https://web.dev/file-system-access/
        
         | mvanaltvorst wrote:
         | I'm curious, in what manner could this method speed up Next.js
         | builds? That's all done locally, which negates the effect of
         | HTTP range requests, right?
        
           | city41 wrote:
           | I'm guessing they mean rather than build a static Next site
           | that generates 10k+ pages (or whatever large means in the
           | given context), it instead creates one page that just queries
           | the data from the client.
           | 
           | I have one Next static site that has about 20k pages and
           | takes about 20 minutes to build and deploy. I think that's an
           | acceptable build time. But I do know of other people around
           | the net who have mentioned having sites with 20k-ish pages
           | taking an hour+ to build. For them I could see the desire to
           | try this sqlite trick.
        
             | nindalf wrote:
             | You should write a post about this if you implement it. My
             | humble suggestion for blog post title - 'Speed up your
             | Next.js builds with this author's one weird trick! Vercel
             | hates him!'
        
       | modeless wrote:
       | Super cool. I wonder if an approach like this could ever work for
       | a writable database. I guess the issue is that you couldn't have
       | fine grained access control without a whole lot of server side
       | validation, at which point you might as well just run a regular
       | database server.
        
       | arafalov wrote:
       | Amazing, especially - for me - that the FTS5 full-text search
       | just works. Longer term, I am if it were possible to split the DB
       | code into read and write parts and cross-compile only read part
       | for delivery to the browser.
        
         | rzzzt wrote:
         | If you are interested in full-text search on the client, Lunr
         | is also an option: https://lunrjs.com/docs/index.html
        
           | tehbeard wrote:
           | Lunr needs the full index on client though, right?
           | 
           | Being able to use fts-5 without the penalty of having to pull
           | down the whole index make it work much better at larger
           | scales, even with the penalty of additional network requests.
        
             | formerly_proven wrote:
             | This could be genuinely interesting for tools like e.g.
             | sphinx-doc, which currently has a client-side search that
             | does indeed ship the entire index to the client.
        
               | tehbeard wrote:
               | There's probably a dataset size tradeoff with small
               | enough number of documents, but even then this could have
               | the option of if db is < x MB total, fetch all in async
               | task and then use that.
        
       | cozzyd wrote:
       | If you just want static data, using jsroot or jsfive might be a
       | better option.
        
       | BMorearty wrote:
       | This is fantastically creative. And the author does a great job
       | of starting out by describing why this is useful.
       | 
       | And then using SQLite to insert and update DOM elements? Holy
       | cow, icing on the cake. Unlike the first part, there's no
       | explanation of why you'd want to do that. But by that point I was
       | so drawn in that I didn't care and was just enjoying the ride.
        
         | phiresky wrote:
         | Yeaah I felt like at that point the article was already long
         | enough so I didn't bother describing the DOM part too much -
         | even though I spent more time implementing that than I did
         | implementing the rest ;)
         | 
         | Basically SQLite has a virtual table mechanism [1] where you
         | have to define a few functions that figure out how to scan your
         | "fake" table / which indices to use and then how to read /
         | write the actual data. I hook into this mechanism and redirect
         | the request to DOM functions like querySelector() etc. Then
         | there's the issue about SQLite being fully synchronous, but I
         | have to run it in a WebWorker - and the WebWorker can't
         | actually access the DOM and it can only communicate
         | asynchronously with the main thread... So I have to do some
         | weird stuff with SharedArrayBuffer and Atomics.wait to make
         | that work [2].
         | 
         | [1] https://www.sqlite.org/vtab.html [2]
         | https://github.com/phiresky/sql.js-httpvfs/blob/master/src/v...
        
           | BMorearty wrote:
           | MY GOD. You are my hero.
        
       | simonw wrote:
       | Pop open the network pane in the browser tools and try running
       | this SQL query for a demo of how clever this is:
       | select country_code, long_name from wdi_country         order by
       | rowid desc limit 100
       | 
       | It fetches just 54.2KB of new data (across 49 small HTTP
       | requests) to return 100 results - from a statically hosted
       | database file that's 668.8MB!
       | 
       | I have an animated GIF demo of this here:
       | https://twitter.com/simonw/status/1388933800445452290
        
         | WJW wrote:
         | So assuming no country has a name longer than 98 characters and
         | that all country codes are 2 characters, that is over 500%
         | overhead? Are you missing a /s in your post?
        
           | phiresky wrote:
           | Since random accesses across the internet are really slow,
           | for this kind of fairly small table (where SQLite stores the
           | row data inline within the B-Tree of the table) it basically
           | fetches the whole content for each row - so even if you query
           | only the long_name and country_code column, it will in fact
           | fetch the data of all 29 columns in that table.
           | 
           | If you want it to fetch less data for querying a subset of
           | columns, you could create create an index on those columns -
           | then SQLite will do an COVERING INDEX scan and thus read only
           | the necessary data (with the B-Tree itself and the start /
           | end page alignment being the only overhead).
        
             | cookguyruffles wrote:
             | Nothing to add to the conversation, just wanted to say I
             | absolutely adore this. Years ago I used to think it'd be
             | nice if search engines simply exposed all their shards to
             | clients and let clients do all the results merging and
             | suchlike. Of course that's probably a terrible idea in
             | practice for performance, but this library is definitely
             | implementing the spirit in a very practical way!
             | 
             | It also reminded me of the inverse of this hack. In old
             | versions of Qemu (possibly it is still implemented, but I
             | have vague memories it got ripped out), you could point
             | Qemu at a directory on disk and it'd produce an emulated
             | floopy disk drive with a virtual FAT12 image containing the
             | directory contents. AFAIK it didn't keep the actual data in
             | memory, I guess all it needed was file sizes to know how to
             | build a virtual bytearray that contained the filesystem
             | metadata + proxied reads from the underlying files for data
             | sectors. I look forward to seeing your implementation of
             | this concept in a virtualized SQLite file <-> GraphQL proxy
             | ;)
        
           | simonw wrote:
           | The clever part here is only fetching 50KB out of 680MB.
        
           | ajfjrbfbf wrote:
           | > sql.js only allows you to create and read from databases
           | that are fully in memory though - so I implemented a virtual
           | file system that fetches chunks of the database with HTTP
           | Range requests when SQLite tries to read from the filesystem:
           | sql.js-httpvfs. From SQLite's perspective, it just looks like
           | it's living on a normal computer with an empty filesystem
           | except for a file called /wdi.sqlite3 that it can read from.
           | 
           | From this paragraph it should be pretty clear that it's
           | actually a great result. The database will obviously need to
           | read more data than it presents, so more is fetched.
        
           | watermelon0 wrote:
           | Have you actually read the article? SQLite is unmodified, and
           | thinks it runs on a virtual file system, which fetches file
           | chunks via HTTP range headers.
           | 
           | It's REALLY impressive that you only need to read 54 KB out
           | of 700 MB, to fetch the records.
        
             | OOPMan wrote:
             | It's impressive on one hand.
             | 
             | On the other it's still a lot of overhead.
        
             | throwawayboise wrote:
             | Do most static site hosters support range requests?
        
         | bbkane wrote:
         | Would it be possible to use a datasette frontend with this as a
         | "backend" and statically host the whole thing?
        
           | simonw wrote:
           | Not easily - Datasette is written in Python, so you could try
           | running it in WebAssembly (like Mozilla did for Jupyter with
           | https://hacks.mozilla.org/2019/03/iodide-an-experimental-
           | too... ) but it would be enough work that it might be easier
           | to reimplement a subset of Datasette directly in JavaScript.
        
             | bbkane wrote:
             | Thank you!
        
         | phiresky wrote:
         | Huh, that's actually kind of a worst case I didn't think about:
         | Since you're doing a reverse table scan my "sequential access"
         | detection doesn't kick in. If you do the same query but with a
         | forward scan it should fetch roughly the same amount of data
         | but only do like 5 HTTP requests since the request size doubles
         | for every sequential access.
         | 
         | e.g.:
         | 
         | select country_code, long_name from wdi_country where rowid >=
         | 164 order by rowid asc limit 100;
        
           | vladf wrote:
           | I solved a similar problem recently: given a stream of data,
           | how should you choose packet size in an online way to
           | minimize regret (a linear combination of spare capacity of
           | last packet and total packets used).
           | 
           | Turns out doubling isn't the best strategy. The optimal
           | solution is actually to add a constant increment to packet
           | size. How much depends on relative cost of the terms in the
           | regret function.
        
           | brandmeyer wrote:
           | > I've set the page size to 1 KiB for this database.
           | 
           | > That's because I implemented a pre-fetching system that
           | tries to detect access patterns through three separate
           | virtual read heads and exponentially increases the request
           | size for sequential reads.
           | 
           | > Since you're doing a reverse table scan my "sequential
           | access" detection doesn't kick in.
           | 
           | You know, starting off with the default 4kB page size
           | naturally adds some resistance to these kinds of failure
           | cases. If the VFS isn't issuing many requests in parallel, I
           | would think that setting up a page size near target_bandwidth
           | * round_trip_time would be a better initial guess. 1kB would
           | be appropriate for a pretty low latency-bandwidth product.
        
             | phiresky wrote:
             | That's true, but it also means that random access will
             | always use at least that amount of data even if it only has
             | to fetch a tiny amount. I did a few (non-scientific)
             | benchmarks on a few queries and 1kB seemed like an OK
             | compromise.
             | 
             | And note that the request chunk size is bound to the SQLite
             | page size, and to change that page size you have to rewrite
             | the whole DB. So it can't be set on the fly unless you have
             | multiple copies of the database.
        
               | pjc50 wrote:
               | 1kb fits in most IP MTU sizes, so that seems reasonable.
        
           | simonw wrote:
           | Hah, I thought "in reverse order by ID" might be a stress
           | test but I was still very impressed by how it performed!
        
       | Gehinnn wrote:
       | TL;DR: Compile SQLite to JS with emscripten, implement a virtual
       | fs in JS to stream chunks of a statically hosted (readonly) SQL
       | database.
       | 
       | If queries make use of indices, only a fraction of the database
       | needs to be downloaded.
       | 
       | Also, you can use SQLite to query the DOM.
        
         | [deleted]
        
         | pudmaidai wrote:
         | I can't figure out exactly how it knows which chunk to
         | download. Does it always download the whole index first? Or
         | does it include it in the built JS file itself?
        
           | fulafel wrote:
           | B-Tree indexes are designed to work like this, to require a
           | low number of IO operations. The index contains pointers to
           | other places in the index.
        
             | pudmaidai wrote:
             | So the answer is "yes, it has to download the index first"?
             | None of these comments answer my question.
        
               | rakoo wrote:
               | Everything in SQLite is stored in B-Trees. Data or
               | indexes. So you don't need to download the whole index
               | first; you only need to download the necessary pages of
               | the trees to access data, whether it's part of an index
               | or actual data
        
               | tehbeard wrote:
               | It just needs to download the pages (x KB chunks) to
               | traverse from the header to the particular index and
               | table schema, a fraction of the whole data.
        
               | detaro wrote:
               | At least the information describing what tables and
               | indices there are and where to find them - and then it
               | gets what it needs once a query is run. Just like sqlite
               | would if running from a local file on disk.
        
               | phiresky wrote:
               | The B-Tree is a tree that in this case is perfectly
               | balanced. So if you do a query with an index in a
               | database it will fetch an logarithmic amount of data from
               | the index and then a constant amount of data from the
               | table.
               | 
               | For the example the wdi_data table is 300MB and an index
               | on it is 100MB in size. This index has a tree depth of 4
               | - which means SQLite has to read exactly 4 pages (4KiB)
               | to get to the bottom of it and find the exact position of
               | the actual row data.
               | 
               | you can check the depth of the b-trees with
               | `sqlite3_analyzer`.
        
           | formerly_proven wrote:
           | SQLite has runtime-pluggable VFS support, i.e. you give it a
           | struct with functions for opening a file, reading some bytes,
           | writing some bytes, synchronizing file contents, closing a
           | file. This project provides such a VFS module, that, because
           | it actually runs in the browser, performs HTTP requests to
           | read data. Emscripten provides a way to run a mix of C/C++
           | code in the same environment as some JavaScript code inside
           | the browser. The reason SQLite has this pluggable VFS support
           | is to properly support embedded systems, different locking
           | APIs, and things like database encryption.
           | 
           | https://www.sqlite.org/vfs.html
        
           | cookguyruffles wrote:
           | Both the index and table data are btrees. These are trees -
           | the root node sits in some known location (offset) in the
           | file, referenced by the file header and metadata. As SQLite
           | traverses the tree, it encounters new descendents it would
           | like to visit, presumably identified by their byte offset in
           | the file, which is all needed for this VFS magic to issue a
           | suitable range request.
           | 
           | - SQlite opens the file and reads 4kb worth of header ->
           | range request for byte 0-4096
           | 
           | - headers/metadata refers to index table with root node at
           | 8192kb
           | 
           | - user issues SELECT * from index WHERE name = 'foo'
           | 
           | - SQLite reads root node from the file (range request for
           | 8192kb..)
           | 
           | - Root node indicates left branch covers 'foo'. Left branch
           | node at address 12345kb
           | 
           | - Fetch left branch (range request for 12345kb)
           | 
           | - New node contains an index entry for 'foo', row 55 of data
           | page at 919191kb
           | 
           | - SQLite reads data page (range request for 91919191kb..)
           | 
           | etc etc etc
        
           | mvanaltvorst wrote:
           | That's part of SQLite. It has been optimised to reduce disk
           | reads, because those can be slow on spinning hard drives.
           | Coincidentally, this translates well into an optimised
           | algorithm that minimises the amount of HTTP range requests to
           | make.
        
       | diveanon wrote:
       | I am a huge fan of sqlite and will definitely be giving this a
       | shot in the near future.
       | 
       | Incredible work.
       | 
       | I see myself using this in conjunction with a conventionally
       | hosted pg db for dynamic content.
        
       | rjeli wrote:
       | Nice. Reminds me of Bellard's https://vfsync.org/ . It's file
       | system pages served over http, to back his jslinux browser VMs
        
         | [deleted]
        
       | bilater wrote:
       | This is awesome. Got my wheels turning :)
        
       | dreix wrote:
       | Over the last few months I tried to think of a clever way to set
       | up a legacy site for a dictionary that I serve on a VM just
       | because I also need to run sqlite. Since I want to make sure
       | it'll run for longer than me paying for the VM this is the best
       | possible solution. At some point no more updates will happen and
       | it's going to be a static website. So bundling it like this is
       | incredible. I can run multiple backups on different hosts with no
       | additional costs.
        
       | simonw wrote:
       | Using HTTP range requests like this is just SO clever.
        
       | laurencerowe wrote:
       | In the genomics world, Tabix indices enables similar use cases.
       | An ordered TSV file is compressed in chunks (bgzip) and a Tabix
       | index created to allow range based access by mapping from the
       | index -> chunk. This allows a browser based genome browser zoomed
       | into a section of the genome to fetch information from a multi
       | gigabyte file.
       | 
       | http://www.htslib.org/doc/tabix.html
        
         | Ultimatt wrote:
         | now if only tabix and most hokey bioinformatics formats would
         | die and just be replaced with a formal schema spec in SQLite...
        
           | laurencerowe wrote:
           | Honestly I think Tabix's bgzipped TSV is one of the less
           | hokey bioinformatics formats, at least compared to the
           | various custom binary formats floating around.
           | 
           | For web browser based genome browsers I suspect this (very
           | cool!) sqlite hack would require many more http requests.
        
       | spamalot159 wrote:
       | This is really cool! I wonder what the restrictions are and if we
       | would ever be able to write to a SQLite db like this in the
       | future. This could push more to the front end without needing to
       | write apis.
        
         | phiresky wrote:
         | The main restriction is that the DB really needs well fitting
         | indexes, otherwise querying is really slow and fetches a lot of
         | data.
         | 
         | Regarding writing:
         | 
         | You could of course implement a writing API with POST requests
         | for changing pages of the database - but then you would lose
         | most of the benefits of this (not requiring any special kind of
         | server).
         | 
         | I also thought about implementing a kind of overlay filesystem,
         | where chunks that are written to the file are stored in a local
         | storage so the modified data is available locally while still
         | reading everything else from the remote database.
         | 
         | Interestingly in SQLite that's already exactly what the WAL
         | mode does: It's a second file next to the database that's just
         | a set of pages that are overlaid over the main file when read
         | queries happen - which allows concurrent readers and writers
         | since the database itself isn't in an undefined state even when
         | write transactions are happening.
         | 
         | So you could enable WAL mode and disable WAL auto
         | checkpointing, then you get a downloadable WAL file that can be
         | read by normal SQLite and written back to the main file. It
         | would be neat, but I'm not sure what the actual use case would
         | be ;)
        
           | WrtCdEvrydy wrote:
           | Since you can do static hosting from a git repo, I wonder if
           | you could directly push your changes to your git repo and
           | have your CI/CD solution just deploy it instead?
           | 
           | There has to be a git.js implementation out there and you
           | could move the DB to it's own repo and create an https access
           | token (for Github)... the issue there is that someone could
           | use that token to commit whatever to your database repo.
        
             | wh33zle wrote:
             | Maybe forcing the user to login with GitHub would be an
             | option? And the changes to the database could be modelled
             | as pull requests?
        
           | stagas wrote:
           | > I also thought about implementing a kind of overlay
           | filesystem, where chunks that are written to the file are
           | stored in a local storage so the modified data is available
           | locally while still reading everything else from the remote
           | database.
           | 
           | Perhaps adding IPFS to the mix for persisting data would be
           | interesting, I'm sure there are use cases in peer to peer
           | applications. Anyway, amazing innovation thank you for
           | writing this :)
        
       | BlueTemplar wrote:
       | This seems cool, but doesn't Fossil (from the SQLite author)
       | already do it "out of the box" ?
       | 
       | https://www.fossil-scm.org/home/doc/trunk/www/index.wiki
        
         | detaro wrote:
         | Fossil runs a server application.
        
       | wheybags wrote:
       | This is an really awesome idea! I have a plan for a static-ish
       | site (updated daily) that I was going to use sqlite for anyway,
       | but server side. I will definitely look into this approach
       | instead!
        
       | drtournier wrote:
       | Solution works really well to databases which will not be updated
       | frequently, like a standalone site.
       | 
       | Although one should be aware of one very important git behavior -
       | git does not diff binary files (like SQLite dbs). That means 2
       | things:
       | 
       | 1. Each db update will generate a new file in git, maintaining
       | the whole old file in history, instead of the diff in bytes. This
       | will accumulate a lot of clutter in the repo
       | 
       | 2. As git does not diff binaries, there is a very small risk of
       | corruption (especially if you work in multiple OSs, because of
       | CRLF)
       | 
       | Ref - https://robinwinslow.uk/dont-ever-commit-binary-files-to-
       | git
        
         | phiresky wrote:
         | Git does actually diff binaries and stores them very
         | efficiently :) If you do a single small UPDATE in your db file
         | git will only store the changed information. It's just kinda
         | slow, and for most binary files the effort git spends to try
         | and compute deltas for binary files is useless - which is why
         | it has a bad reputation for binary files.
         | 
         | Note that the diffs that git shows you are completely unrelated
         | to the deltas it uses to compress it's database - which are
         | always "binary deltas" and not line-based diffs.
         | 
         | Also I'm not sure why you mean that db corruption possibility
         | has something to do with whether or not it stores diffs?
        
         | laszlokorte wrote:
         | Minor nitpick: Git does not store diffs for any file format but
         | always the full file for each version. So it does not really
         | matter that its binary (except for not being able to VIEW the
         | diff, but I guess you could even implement a plugin for that)
         | but just that it's a big file. Even a huge text file would be
         | fully stored per version.
         | 
         | /edit: The sibling comments mentions that git can infact delta
         | compress older commits for storage efficency. But my point was
         | that git commits are not deltas but full snapshots.
        
         | tehbeard wrote:
         | You are only gonna encounter corruption if u either a) messed
         | up the gitconfig for line endings or b) named the database
         | mydbfile.txt
        
         | xucheng wrote:
         | You can just store the database in text format (e.g. csv) in
         | the git and turn it to SQLite db when building the website.
        
       | claytongulick wrote:
       | Incredibly clever use of paging and http range queries! Well
       | done.
        
       | banana_giraffe wrote:
       | In a similar vein, I've mentioned this before, but if you're
       | doing Python stuff, you can use the apsw package (not the one in
       | PyPi, though) to write a VFS layer that SQLite will use to read
       | the database.
       | 
       | I've used this for the same basic idea as this article, only
       | letting me store SQLite databases in AWS's S3 that I can access
       | with AWS APIs so they don't need to be public. It works well,
       | though it's absolutely not for every use case, the overhead is
       | considerable.
       | 
       | I even used it once to read SQLite database files in a zip file
       | stored in S3 without having any local storage to use. Not one of
       | my prouder moments, but hey, I coded my way out of the corner
       | someone else designed for me.
        
       | tlb wrote:
       | The question I had is answered by this line of code:
       | xhr.setRequestHeader("Range", "bytes=" + from + "-" + to);
       | 
       | I am a little surprised you can just do that. In
       | https://github.com/phiresky/sql.js-httpvfs/blob/master/src/l...
        
         | yread wrote:
         | I've learnt about this by using https://www.biodalliance.org
         | 
         | It's an embedded genome viewer, you can just point it at a
         | multigigabyte reference files and .seg files and it loads super
         | quick
        
           | punnerud wrote:
           | Here is direct link to GitHub with the usage: https://github.
           | com/dasmoth/dalliance/search?q=bytes%3D%27&ty...
        
         | crazygringo wrote:
         | Not all webservers support/enable it, so YMMV.
         | 
         | But as long as you're dealing with a known server that does,
         | then gravy!
        
         | dathinab wrote:
         | Range headers are a pretty standard tools to e.g. continue
         | interrupted downloads and similar.
         | 
         | Any well designed system, especially if it has static sources
         | and is server cached _should_ support it.
         | 
         | Surprisingly many web-frameworks don't support it out of the
         | box, or don't support it well.
         | 
         | Either way gh-pages are static content and probably with some
         | server side regional caches, so I'm not surprised it works.
        
       | villasv wrote:
       | Wicket stuff, loved it. Not sure if this will ever be "production
       | ready" or merged by sql.JS-like projects but cool proof of
       | concept.
        
       | ponytech wrote:
       | This is truly awesome. Very clever.
        
       ___________________________________________________________________
       (page generated 2021-05-02 23:00 UTC)