[HN Gopher] Cron-based backup for SQLite
       ___________________________________________________________________
        
       Cron-based backup for SQLite
        
       Author : davnicwil
       Score  : 329 points
       Date   : 2022-05-15 09:14 UTC (13 hours ago)
        
 (HTM) web link (litestream.io)
 (TXT) w3m dump (litestream.io)
        
       | benbjohnson wrote:
       | Litestream author here. The motivation for this docs page was
       | two-fold. First, laying out the different options with their
       | trade-offs helps people understand what Litestream is trying to
       | be so it helps keep my focus narrow. For example, if someone is
       | looking for very high durability guarantees then they can see
       | that rqlite might be a better fit and they don't try to fit
       | Litestream into that mold.
       | 
       | Second, Litestream doesn't interfere with other backup methods so
       | you can run it alongside a cron-based backup. I typically run
       | both because I'm overly paranoid and because it's cheap.
        
         | mattl wrote:
         | crontab isn't a tool, cron is the tool. It reads a little
         | strange because of that.
        
           | benbjohnson wrote:
           | Good call. I updated the docs repo and the site should update
           | in a few minutes.
        
             | mattl wrote:
             | Awesome!
        
         | wheels wrote:
         | I'm not saying that's the logic here, but this is a common
         | pattern because SEO. It's a useful way to end up on potential
         | customers' radar.
        
         | ctur wrote:
         | Nice post -- one suggestion would be to use zstd instead of
         | gzip though. Out of the box it is much faster and will result
         | in smaller files, and allows tuning to fit inside memory and
         | cpu thresholds different use cases might prefer.
         | 
         | Friends don't let friends use gzip :)
        
           | otterley wrote:
           | Storage is so cheap nowadays that usability and compression
           | speed are more important for most people.
        
             | usefulcat wrote:
             | For a given compression ratio, zstd compression is faster
             | than gzip by a good margin (even without using multiple
             | threads). And zstd decompression is pretty much always
             | faster than gzip too.
        
           | benbjohnson wrote:
           | Thanks, I thought about using a different compression tool
           | but everybody knows gzip so I figured I'd keep it simple.
           | Litestream uses LZ4 internally (which is great) but it's not
           | something a lot of developers come across too often.
        
       | nikeee wrote:
       | There is also VACUUM INTO:
       | https://www.sqlite.org/lang_vacuum.html#vacuuminto
        
         | nbevans wrote:
         | Indeed and in my experience this has been more
         | reliable/predictable than the official backup API. The official
         | backup API can get "stuck" on a very large and hot database
         | because it, essentially, has to start from scratch if a page is
         | modified. VACUUM INTO doesn't have this problem.
        
       | charcircuit wrote:
       | >Do not use `cp` to back up SQLite databases. It is not
       | transactionally safe.
       | 
       | I've read that this is true, but it has always confused me
       | because I would expect that using cp would be equivalent to if
       | the application had crashed / you lost power.
        
         | fmajid wrote:
         | No, because cp takes a non-zero amount of time and so by the
         | time it has completed the first blocks may have been rewritten,
         | thus the backup is inconsistent. If you take something like a
         | ZFS snapshot, then, yes, that would be a consistent backup.
        
         | amcvitty wrote:
         | Not an expert but the transaction log is in another file and
         | you'd need both to be in line with each other and backed up, so
         | perhaps that's a factor
        
           | benbjohnson wrote:
           | I wrote up a bit longer explanation of the failure modes
           | (which depend on your journaling mode) in this comment:
           | https://news.ycombinator.com/item?id=31387447
        
         | kijin wrote:
         | `cp` is not instantaneous. Unless you lock the entire file, one
         | block can change while you're reading another block. You end up
         | with a Frankenstein of a file, compared with a consistent
         | snapshot if you had simply lost power.
         | 
         | If your filesystem supports snapshotting, it would be safe to
         | create a snapshot and `cp` the database off of that snapshot.
        
       | kijin wrote:
       | Another option is to use the `.dump` command that produces a
       | series of SQL statements to recreate the database and populate it
       | with data, just like good ol' mysqldump.
       | 
       | I'm not sure whether this is faster or safer than `.backup` which
       | produces an actual database file with indexes and whatnot, but a
       | benefit of the plain text output is that it's very flexible. The
       | output can be piped to gzip or any other program on the fly, even
       | across the network, without waiting for an intermediate file to
       | be written in full. It can also be imported to other types of
       | RDBMS with minimal changes.
        
         | benbjohnson wrote:
         | The .dump command is a great option for smaller databases. The
         | main downsides are that it produces a larger output, it'll take
         | longer to compress that output, and the restore time is much
         | longer since all the commands have to be replayed.
         | 
         | The benefit to using the backup API is that the file is ready-
         | to-go as the database and can be started up immediately.
        
       | evoxmusic wrote:
       | Fun to see that we share some very similar ideas with Replibyte -
       | an open source tool to seed a dev database with real data.
       | https://github.com/Qovery/replibyte
        
       | cpach wrote:
       | Nitpick: On a modern Linux system, I would probably use a systemd
       | timer instead of cron.
        
       | manjana wrote:
       | > Do not use `cp` to back up SQLite databases. It is not
       | transactionally safe.
       | 
       | Anyone know if comitting the sqlite db file to git is not safe
       | either?
        
         | benbjohnson wrote:
         | Anything that only uses the file system copying mechanics is
         | not safe (which includes a git check-in). I wrote a more
         | detailed comment here:
         | https://news.ycombinator.com/item?id=31387447
        
         | Moto7451 wrote:
         | If you're not actively writing to the database you can cp or
         | commit it safely. SQLite uses a journal or a write ahead log
         | (WAL) file to process transactions.
         | 
         | If you cp a database mid transaction you will loose the
         | transaction as it is processed in the separate journal or WAL
         | file until commit. If you copy mid commit then you will have
         | incomplete data in the main database file and possibly data
         | corruption.
        
       | CraigJPerry wrote:
       | Do the cron entries work as-is? I'm sitting here trying to decide
       | if cron will break here:                   aws s3 cp
       | /path/to/backup.gz s3://mybucket/backup-`date +%H`.gz
       | 
       | The % is going to be interpreted as a newline, it should need
       | escaping as in date +\%H
       | 
       | But it's in backticks (subshell) so now I'm doubting myself and
       | thinking maybe the subshell saves this but my brain is screaming
       | NO NO BUG! :-)
       | 
       | I'm going to have to spin up a VM to test it and seek inner peace
       | here :-)
        
         | CraigJPerry wrote:
         | Yeah this isn't used in the cron line itself, this is wrapped
         | in a script further down the article and the script is run from
         | cron, which makes sense and no bug.
        
       | gorgoiler wrote:
       | I've always assumed sqlite database files are resilient to being
       | copied, mid-write. Sure: maybe that last transaction will be
       | corrupted, but everything up to that point will be in tact,
       | right? The next backup will capture whatever was in the half-
       | complete transaction.
       | 
       | Am I deluded in this?
        
         | benbjohnson wrote:
         | Litestream author here. Your backup can be corrupt if you
         | simply use "cp" but the details depend on your journaling mode.
         | 
         | If you're using the default rollback journal mode, it works by
         | copying old pages to a "-journal" file and then updating the
         | main database file with new pages. Your transaction finally
         | commits when you delete the journal file. Copying the main
         | database file during a write transaction can give you either a
         | subset of transaction pages or half-written pages.
         | 
         | If you're using the WAL journaling mode, it works by writing
         | new pages to the "-wal" file and then periodically copying
         | those pages back to the main database file in a process called
         | "checkpointing". If you only "cp" the main database file then
         | you'll be missing all transactions that are in the WAL. There's
         | no time bound on the WAL so you could lose a lot of
         | transactions. Also, you can still corrupt your database backup
         | if you "cp" the main database file during the checkpoint.
         | 
         | You could "cp" the main database file and the WAL file and
         | _probably_ not corrupt it but there 's still a race condition
         | where you could copy the main file during checkpointing and
         | then not reach the WAL file before it's deleted by SQLite.
         | 
         | tl;dr is to just use the backup API and not worry about it. :)
        
         | creatonez wrote:
         | If you make an atomic copy -- like something equivalent to a
         | power failure, then it's transactionally safe. One way this can
         | be done is with a snapshot-capable filesystem like ZFS, btrfs,
         | or XFS.
         | 
         | But a traditional `cp` will go from left to right in the file,
         | it won't take the contents all at once. Which is explicitly
         | documented as a thing that will break Sqlite -
         | https://www.sqlite.org/howtocorrupt.html
        
       | jchw wrote:
       | Question:
       | 
       | > Do not use `cp` to back up SQLite databases. It is not
       | transactionally safe.
       | 
       | I'm curious what happens. Does this mean that it may cause side
       | effects from transactions that are not committed to become
       | visible?
       | 
       | I guess if you could snapshot the database file and WAL and etc.
       | simultaneously this isn't an issue, right? Because otherwise it
       | would be a problem for a database any time your program or
       | machine crashed.
        
       | simonw wrote:
       | In terms of restoring from backups: "Be sure to stop your
       | application before copying the database into place"
       | 
       | This intuitively feels right to me, but I'd be interested to
       | understand the mechanics behind it. What can go wrong if you
       | ignore this advice and use "mv" to atomically switch out the
       | SQLite file from underneath your application?
       | 
       | My hunch is that this relates to journal and WAL files - I
       | imagine bad things can happen if those no longer match the main
       | database file.
       | 
       | But how about if your database file is opened in read-only or
       | immutable mode and doesn't have an accompanying journal/WAL?
        
         | [deleted]
        
         | zekica wrote:
         | On Linux at least, if you switch the file, the old file (even
         | if deleted) will still be used until the file descriptor is
         | closed.
        
       | ancientsofmumu wrote:
       | The example uses Dead Man's Snitch, Healthchecks.io is my
       | preference and their plans offer more for less. Both offer code
       | on Github for examination.
       | 
       | https://deadmanssnitch.com/plans vs.
       | https://healthchecks.io/pricing/
        
         | WrtCdEvrydy wrote:
         | Is there anything self hosted in this space?
        
           | cuu508 wrote:
           | Yes, Healthchecks is open-source and can be self-hosted
           | https://github.com/healthchecks/healthchecks
        
           | jethro_tell wrote:
           | Depending on your environment, i.e. if you have a metrics
           | system in place already, I generally just post a success
           | metric and alert on lack of success metrics.
        
       | spencerbrian wrote:
       | I do similar, it's a relatively small but important SQLite
       | database so every five minutes I take a backup using that method,
       | and check the generated SQL files into remote git repositories.
       | 
       | One on GitHub which is just a snapshot, a single commit in an
       | otherwise empty repo, force pushed. This is for recovery
       | purposes, I don't need the history and would probably run afoul
       | of their service limits if I did so.
       | 
       | And the other on Azure DevOps which has the entire commit history
       | for the past few years. This one is a bit trickier because the
       | pack files end up exhausting disk space if not cleaned up, and
       | garbage collection interrupts the backups. So it clones just the
       | latest commit (grafted), pushes the next commit, and wipes the
       | local repo. No idea how this looks on the remote backend but it's
       | still working without any size complaints, and it's good to know
       | there's an entire snapshotted history there if needed. As well as
       | being able to clone the most recent for recovery if GitHub fails.
        
       | Scaevolus wrote:
       | I rolled my own terrible sqlite replication system once. I used
       | execution tracing hooks
       | (https://rogerbinns.github.io/apsw/execution.html#tracing) to
       | record all mutation queries and their arguments sent to the
       | database, then periodically copied that log to a replica system
       | and executed the identical statements.
       | 
       | It's rather primitive, but could be considered an intermediate
       | step between this blog's full copying and litestream's WAL-based
       | replication.
        
       | encoderer wrote:
       | If you use cron for anything important you should use Cronitor -
       | there's a free for life plan that can cover your SQLite backups.
       | 
       | The article mentions calling a "dead man" service and that is
       | fine but Cronitor (built by 3 people, no VC dollars) is a proper
       | monitoring solution for jobs not just a "dead man" alert. Happy
       | monitoring
        
       | zrail wrote:
       | If Ben is reading the thread, tiny but important bug in the
       | example script. It should have set -e at the top to ensure that a
       | failed step will actually fail the script. Otherwise the snitch
       | will always get called.
       | 
       | Edit: filed an issue on the docs repo:
       | https://github.com/benbjohnson/litestream.io/issues/54
        
         | creativemonkeys wrote:
         | If I'm using "-e", I also like to use "set -o pipefail" which
         | will fail the script if any command in a pipeline fails too: x
         | | y | z, which is useful if 'y' fails but 'z' continues to
         | process the output and returns 0, -e won't catch that (the
         | example script doesn't use pipes though).
        
           | zrail wrote:
           | Yep, my standard bash header is set -eo pipefail with x if I
           | need to see what's going on. This particular script didn't
           | appear to be using pipes, as you say.
        
         | benbjohnson wrote:
         | Thanks for catching that! I fixed it up in a PR and merged in:
         | https://github.com/benbjohnson/litestream.io/pull/55
        
           | davnicwil wrote:
           | This is so great - open source in action - I can't think of a
           | more apt thing to come out of this post than an on the fly
           | fix to the docs!
           | 
           | Pun absolutely intended.
        
       | TekMol wrote:
       | Do not use `cp` to back up SQLite databases.         It is not
       | transactionally safe.
       | 
       | What if you disable the journal like this:
       | 
       | PRAGMA journal_mode=OFF;
       | 
       | Can cp be used to backup the DB then?
       | 
       | And do you still have to send "COMMIT" after each query or will
       | each query be executed immediately then?
        
         | ralferoo wrote:
         | The main reason is that the backing file is random access, and
         | transactional correctness is ensured using various locking
         | mechanisms. Sqlite generally relies on fsync() ensuring that
         | data is flushed to disk at certain points, so conceptually,
         | data is written to an unused portion of the file, then fsync()
         | is done, then various structures are updated so that this data
         | is now live, leaving some unused space in the middle of the
         | file, and this is fysnc()'d again. Later that unused space
         | might be reclaimed, but only when all current readers and
         | writers have finished.
         | 
         | If you cp the file, you might end up with different chunks from
         | different points in the transaction history that don't make
         | sense combined together. If you use ".backup", it guarantees
         | that the data in the copy will be consistent with the data as
         | it existed at one of the fsync() calls.
         | 
         | Turning off the journalling will likely increase the chance
         | that your copy will be inconsistent, as there will be more
         | churn in the main data file.
        
       | [deleted]
        
       | KingOfCoders wrote:
       | I was amazed by the Livestream post on fly.io - it wants to
       | become a distributed database. This is huge.
        
       | mwint wrote:
       | > B-tree databases like SQLite compress well
       | 
       | Curious, why is this?
        
         | tptacek wrote:
         | This question nerd-sniped me. I don't know either but my first
         | guess, supported by hex dumping a medium-sized database, is
         | that sqlite B-tree pages are collections of 32-bit page offsets
         | that the database is designed to cluster (also, just lots and
         | lots of zeroes). Also, the database I looked at was mostly
         | UTF-8 text, which is obviously compressible, but I don't think
         | that's what Ben meant.
        
         | benbjohnson wrote:
         | Litestream author here. When a page gets full then it splits in
         | two so each new mode has empty space for inserts and updates.
         | That empty space compresses really well. Also, OLTP data can
         | have duplication when you have a categorical column that you
         | haven't moved into a separate table (eg a "type" column that
         | has a couple unique values).
        
       | efrecon wrote:
       | May I chime in with a project around more or less the same idea
       | and use cases. https://github.com/efrecon/sqlite-backup. I am
       | pretty sure there are dozens of similar-but-not-quite projects on
       | GitHub/gitlab/etc.
        
       | davnicwil wrote:
       | Following the announcement of Fly.io acquiring the Litestream
       | project and hiring its creator Ben Johnson I've been looking into
       | it (and am very impressed by what I see!)
       | 
       | Reading through the docs I was delighted to find this full set of
       | instructions including script commands for setting up a simpler
       | regular snapshot backup using cron - for situations where
       | Litestream is (quote) "overkill" - ie where the database is
       | smaller and your durability requirements are lower.
       | 
       | In fact, the docs have a whole alternatives _section_ for
       | different requirements. I think this kind of thing is great, and
       | something more projects should do! Wanted to share it because of
       | this meta thing, but also because it 's just pretty useful _per
       | se_.
        
         | dgb23 wrote:
         | I agree. There are a bunch of benefits for everyone here. Users
         | get to have a sharper view on the intended user case of the
         | tool, this benefits bith sides, and learn more related bits
         | along the way. It also builds trust.
         | 
         | The best marketing for devs is no marketing. Just being honest,
         | clear and helpful is what works best.
        
         | kitd wrote:
         | Agreed. In my mind, the fact that the developers understand
         | limitations in their own tools means that future enhancements
         | are likely to be more focussed and effective.
        
         | mritchie712 wrote:
         | if you run a saas, this is a good move for SEO too. People may
         | be searching "backup sqlite cron" or something like it and
         | realize they'd rather use litestream.
        
         | swah wrote:
         | I felt the same finding that section: by not "pushing too much
         | their solution" they actually increase the chances that I'll
         | use it someday.
        
         | mattgreenrocks wrote:
         | This what a mature project looks like: it knows what it is, who
         | it is for, and what it isn't.
         | 
         | Contrast with so many open source projects that accumulate
         | features and promise the world, so long as you give them a
         | star.
        
           | vanviegen wrote:
           | > Contrast with so many open source projects that accumulate
           | features and promise the world, so long as you give them a
           | star.
           | 
           | What a cynical take.
        
       | pierregillesl wrote:
       | Don't forget to stop writing to the database while doing the
       | backup, otherwise you can run into an infinite loop if you write
       | faster than sqlite3 .backup is doing the backup :D
       | 
       | Learned that the hard way when implementing sqlite3 backups on
       | Gladys Assistant ( open-source home automation platform
       | https://github.com/GladysAssistant/Gladys )
        
         | davnicwil wrote:
         | to be fair, this sounds like a situation where Litestream _is_
         | the more appropriate solution!
        
         | simonw wrote:
         | This surprises me. I was under the impression that SQLite
         | backups run inside a dedicated transaction in order to avoid
         | this.
        
           | simonw wrote:
           | This comment cleared that up for me: VACUUM INTO backups are
           | transactional, but the .backup mechanism is not:
           | https://news.ycombinator.com/item?id=31387556
           | 
           | Ben I suggest updating that cron backups documentation page
           | to recommend VACUUM INTO instead!
        
             | ignoramous wrote:
             | Could start an issue:
             | https://github.com/benbjohnson/litestream.io/issues
        
               | benbjohnson wrote:
               | Good catch. I went ahead and added an issue on the docs.
               | https://github.com/benbjohnson/litestream.io/issues/56
        
           | oefrha wrote:
           | See the SQLite online backup API documentation [1].
           | 
           | You can finish the backup in one step, but a read-lock would
           | be held during the entire duration, preventing writes. If you
           | do the backup several pages at a time, then
           | 
           | > If another thread or process writes to the source database
           | while this function is sleeping, then SQLite detects this and
           | usually restarts the backup process when
           | sqlite3_backup_step() is next called. ...
           | 
           | > Whether or not the backup process is restarted as a result
           | of writes to the source database mid-backup, the user can be
           | sure that when the backup operation is completed the backup
           | database contains a consistent and up-to-date snapshot of the
           | original. However: ...
           | 
           | > If the backup process is restarted frequently enough it may
           | never run to completion and the backupDb() function may never
           | return.
           | 
           | The CLI .backup command does non-blocking backup IIRC so is
           | subject to restarts.
           | 
           | [1] https://www.sqlite.org/backup.html
        
         | littlecranky67 wrote:
         | Can you elaborate more? I think the cron-solution will be
         | unable to synchronize with your application code to determine
         | when to stop writing, so more background and your solution
         | would be of interest.
         | 
         | As I understand it, while you do the backup, other writes
         | should go to the WAL log and only get commited until after the
         | backup?
        
           | prirun wrote:
           | I did some reading, and there are 2 ways to use the SQLite
           | backup API:
           | 
           | 1. call backup_init, backup_step with a step size of -1, then
           | backup_finish. This will lock the db the whole time the
           | backup is taking place and backup the entire db.
           | 
           | 2. call backup_init, backup_step in a loop until it returns
           | SQLITE_DONE with a positive step size indicating how many
           | pages to copy, then backup_finish.
           | 
           | With method 2, no db lock is held between backup_step calls.
           | If a write occurs between backup_step calls, the backup API
           | automagically detects this and restarts. I don't know if it
           | looks at the commit count and restarts the backup from the
           | beginning or is smart enough to know the first changed page
           | and restarts from there. Because the lock is released, a
           | continuous stream of writes could prevent the backup from
           | completing.
           | 
           | I looked in the sqlite3 shell command source, and it uses
           | method 2. So if using the .backup command with continuous
           | concurrent writes, you have to take a read lock on the db
           | before .backup to ensure it finishes. It would be nice if the
           | .backup command took a -step option. That would enable the -1
           | step size feature of method 1. The sqlite3 shell uses a step
           | size of 100.
           | 
           | Another option would be to check backup_remaining() and
           | backup_pagecount() after each step, and if the backup isn't
           | making progress, increase the step size. Once the step size
           | is equal to backup_pagecount() it will succeed, though it may
           | have to lock out concurrent writes for a long time on a large
           | db. There's really no other choice unless you get into
           | managing db logs.
        
           | pierregillesl wrote:
           | It's not what I've experienced!
           | 
           | In my experience, as soon as there is some new data coming in
           | the DB, the .backup command will continue, and if the writes
           | are not stopping, the backup will never stop as well :D
           | 
           | In Gladys case, we put in the application logic a blocking
           | transaction to lock writes during the backup. I haven't found
           | any other way to avoid infinite backups in case of write-
           | heavy databases
        
             | 1958325146 wrote:
             | I ran into the case of needing to back up a write-heavy
             | database without blocking anything, and came up with a
             | solution: Writing a VFS ( https://www.sqlite.org/vfs.html )
             | that makes the application move the Sqlite journals file to
             | a directory for processing instead of deleting them.
             | Another process reads them to see what pages were touched
             | and can very quickly get the changes to update the backup.
        
             | yawaramin wrote:
             | I'm using VACUUM INTO, which does basically that:
             | https://sqlite.org/lang_vacuum.html#vacuuminto
             | 
             | > The VACUUM command with an INTO clause is an alternative
             | to the backup API for generating backup copies of a live
             | database....The VACUUM INTO command is transactional in the
             | sense that the generated output database is a consistent
             | snapshot of the original database.
             | 
             | EDIT: Litestream docs will also recommend that:
             | https://github.com/benbjohnson/litestream.io/issues/56
        
       | Multicomp wrote:
       | A service that says 'here is an alternative to our service'?
       | Right on their own website? Even though cron backups are
       | basically 'competition' to litestream?
       | 
       | That's it. I'm giving fly.io a try next hobby app I start.
       | 
       | I'm so impressed with their business decisions right now, I was
       | afraid for litestream but then I read how the creator is just
       | hired to work on it full time. What a splendid development!
        
         | tptacek wrote:
         | Don't give us that much credit! Litestream is a FOSS project.
         | Fly.io isn't competitive about it. We just bought it because
         | its existence, like, in the universe makes Fly.io more
         | valuable, and we wanted to make sure Ben could keep working on
         | it. We actually already have offerings that compete with it
         | (Fly.io Postgres). :)
         | 
         | (Ben might feel competitive about Litestream though; it's his
         | baby. So maybe give him some credit.)
        
         | jjice wrote:
         | > A service that says 'here is an alternative to our service'?
         | 
         | I actually see this quite a bit. I think the reason for it is
         | that it's a good way to show complexity or moving parts that
         | can break in a process, where the marketing service is just a
         | drop in and start. We can see the SQLite backup process here's
         | and it's not too bad, but we can also see there are a few
         | things that can go wrong that I'm sure litestream takes care of
         | and would allow us to avoid any issues.
         | 
         | For example, backing up the DB onto the local disk before
         | copying it _could_ fail due to lack of disk space, and them we
         | have to deal with the notification and fix for that. I 'm sure
         | litestream is a 10 minute setup that handles a lot of
         | intricacies like that.
        
           | er4hn wrote:
           | It's also in the same spirit as SQLite's own page on if
           | SQLite is a good fit: https://www.sqlite.org/whentouse.html
           | 
           | There is something very powerful about knowing your strength,
           | leaning into it, and building on it.
        
       ___________________________________________________________________
       (page generated 2022-05-15 23:00 UTC)