[HN Gopher] Cron-based backup for SQLite ___________________________________________________________________ Cron-based backup for SQLite Author : davnicwil Score : 329 points Date : 2022-05-15 09:14 UTC (13 hours ago) (HTM) web link (litestream.io) (TXT) w3m dump (litestream.io) | benbjohnson wrote: | Litestream author here. The motivation for this docs page was | two-fold. First, laying out the different options with their | trade-offs helps people understand what Litestream is trying to | be so it helps keep my focus narrow. For example, if someone is | looking for very high durability guarantees then they can see | that rqlite might be a better fit and they don't try to fit | Litestream into that mold. | | Second, Litestream doesn't interfere with other backup methods so | you can run it alongside a cron-based backup. I typically run | both because I'm overly paranoid and because it's cheap. | mattl wrote: | crontab isn't a tool, cron is the tool. It reads a little | strange because of that. | benbjohnson wrote: | Good call. I updated the docs repo and the site should update | in a few minutes. | mattl wrote: | Awesome! | wheels wrote: | I'm not saying that's the logic here, but this is a common | pattern because SEO. It's a useful way to end up on potential | customers' radar. | ctur wrote: | Nice post -- one suggestion would be to use zstd instead of | gzip though. Out of the box it is much faster and will result | in smaller files, and allows tuning to fit inside memory and | cpu thresholds different use cases might prefer. | | Friends don't let friends use gzip :) | otterley wrote: | Storage is so cheap nowadays that usability and compression | speed are more important for most people. | usefulcat wrote: | For a given compression ratio, zstd compression is faster | than gzip by a good margin (even without using multiple | threads). And zstd decompression is pretty much always | faster than gzip too. | benbjohnson wrote: | Thanks, I thought about using a different compression tool | but everybody knows gzip so I figured I'd keep it simple. | Litestream uses LZ4 internally (which is great) but it's not | something a lot of developers come across too often. | nikeee wrote: | There is also VACUUM INTO: | https://www.sqlite.org/lang_vacuum.html#vacuuminto | nbevans wrote: | Indeed and in my experience this has been more | reliable/predictable than the official backup API. The official | backup API can get "stuck" on a very large and hot database | because it, essentially, has to start from scratch if a page is | modified. VACUUM INTO doesn't have this problem. | charcircuit wrote: | >Do not use `cp` to back up SQLite databases. It is not | transactionally safe. | | I've read that this is true, but it has always confused me | because I would expect that using cp would be equivalent to if | the application had crashed / you lost power. | fmajid wrote: | No, because cp takes a non-zero amount of time and so by the | time it has completed the first blocks may have been rewritten, | thus the backup is inconsistent. If you take something like a | ZFS snapshot, then, yes, that would be a consistent backup. | amcvitty wrote: | Not an expert but the transaction log is in another file and | you'd need both to be in line with each other and backed up, so | perhaps that's a factor | benbjohnson wrote: | I wrote up a bit longer explanation of the failure modes | (which depend on your journaling mode) in this comment: | https://news.ycombinator.com/item?id=31387447 | kijin wrote: | `cp` is not instantaneous. Unless you lock the entire file, one | block can change while you're reading another block. You end up | with a Frankenstein of a file, compared with a consistent | snapshot if you had simply lost power. | | If your filesystem supports snapshotting, it would be safe to | create a snapshot and `cp` the database off of that snapshot. | kijin wrote: | Another option is to use the `.dump` command that produces a | series of SQL statements to recreate the database and populate it | with data, just like good ol' mysqldump. | | I'm not sure whether this is faster or safer than `.backup` which | produces an actual database file with indexes and whatnot, but a | benefit of the plain text output is that it's very flexible. The | output can be piped to gzip or any other program on the fly, even | across the network, without waiting for an intermediate file to | be written in full. It can also be imported to other types of | RDBMS with minimal changes. | benbjohnson wrote: | The .dump command is a great option for smaller databases. The | main downsides are that it produces a larger output, it'll take | longer to compress that output, and the restore time is much | longer since all the commands have to be replayed. | | The benefit to using the backup API is that the file is ready- | to-go as the database and can be started up immediately. | evoxmusic wrote: | Fun to see that we share some very similar ideas with Replibyte - | an open source tool to seed a dev database with real data. | https://github.com/Qovery/replibyte | cpach wrote: | Nitpick: On a modern Linux system, I would probably use a systemd | timer instead of cron. | manjana wrote: | > Do not use `cp` to back up SQLite databases. It is not | transactionally safe. | | Anyone know if comitting the sqlite db file to git is not safe | either? | benbjohnson wrote: | Anything that only uses the file system copying mechanics is | not safe (which includes a git check-in). I wrote a more | detailed comment here: | https://news.ycombinator.com/item?id=31387447 | Moto7451 wrote: | If you're not actively writing to the database you can cp or | commit it safely. SQLite uses a journal or a write ahead log | (WAL) file to process transactions. | | If you cp a database mid transaction you will loose the | transaction as it is processed in the separate journal or WAL | file until commit. If you copy mid commit then you will have | incomplete data in the main database file and possibly data | corruption. | CraigJPerry wrote: | Do the cron entries work as-is? I'm sitting here trying to decide | if cron will break here: aws s3 cp | /path/to/backup.gz s3://mybucket/backup-`date +%H`.gz | | The % is going to be interpreted as a newline, it should need | escaping as in date +\%H | | But it's in backticks (subshell) so now I'm doubting myself and | thinking maybe the subshell saves this but my brain is screaming | NO NO BUG! :-) | | I'm going to have to spin up a VM to test it and seek inner peace | here :-) | CraigJPerry wrote: | Yeah this isn't used in the cron line itself, this is wrapped | in a script further down the article and the script is run from | cron, which makes sense and no bug. | gorgoiler wrote: | I've always assumed sqlite database files are resilient to being | copied, mid-write. Sure: maybe that last transaction will be | corrupted, but everything up to that point will be in tact, | right? The next backup will capture whatever was in the half- | complete transaction. | | Am I deluded in this? | benbjohnson wrote: | Litestream author here. Your backup can be corrupt if you | simply use "cp" but the details depend on your journaling mode. | | If you're using the default rollback journal mode, it works by | copying old pages to a "-journal" file and then updating the | main database file with new pages. Your transaction finally | commits when you delete the journal file. Copying the main | database file during a write transaction can give you either a | subset of transaction pages or half-written pages. | | If you're using the WAL journaling mode, it works by writing | new pages to the "-wal" file and then periodically copying | those pages back to the main database file in a process called | "checkpointing". If you only "cp" the main database file then | you'll be missing all transactions that are in the WAL. There's | no time bound on the WAL so you could lose a lot of | transactions. Also, you can still corrupt your database backup | if you "cp" the main database file during the checkpoint. | | You could "cp" the main database file and the WAL file and | _probably_ not corrupt it but there 's still a race condition | where you could copy the main file during checkpointing and | then not reach the WAL file before it's deleted by SQLite. | | tl;dr is to just use the backup API and not worry about it. :) | creatonez wrote: | If you make an atomic copy -- like something equivalent to a | power failure, then it's transactionally safe. One way this can | be done is with a snapshot-capable filesystem like ZFS, btrfs, | or XFS. | | But a traditional `cp` will go from left to right in the file, | it won't take the contents all at once. Which is explicitly | documented as a thing that will break Sqlite - | https://www.sqlite.org/howtocorrupt.html | jchw wrote: | Question: | | > Do not use `cp` to back up SQLite databases. It is not | transactionally safe. | | I'm curious what happens. Does this mean that it may cause side | effects from transactions that are not committed to become | visible? | | I guess if you could snapshot the database file and WAL and etc. | simultaneously this isn't an issue, right? Because otherwise it | would be a problem for a database any time your program or | machine crashed. | simonw wrote: | In terms of restoring from backups: "Be sure to stop your | application before copying the database into place" | | This intuitively feels right to me, but I'd be interested to | understand the mechanics behind it. What can go wrong if you | ignore this advice and use "mv" to atomically switch out the | SQLite file from underneath your application? | | My hunch is that this relates to journal and WAL files - I | imagine bad things can happen if those no longer match the main | database file. | | But how about if your database file is opened in read-only or | immutable mode and doesn't have an accompanying journal/WAL? | [deleted] | zekica wrote: | On Linux at least, if you switch the file, the old file (even | if deleted) will still be used until the file descriptor is | closed. | ancientsofmumu wrote: | The example uses Dead Man's Snitch, Healthchecks.io is my | preference and their plans offer more for less. Both offer code | on Github for examination. | | https://deadmanssnitch.com/plans vs. | https://healthchecks.io/pricing/ | WrtCdEvrydy wrote: | Is there anything self hosted in this space? | cuu508 wrote: | Yes, Healthchecks is open-source and can be self-hosted | https://github.com/healthchecks/healthchecks | jethro_tell wrote: | Depending on your environment, i.e. if you have a metrics | system in place already, I generally just post a success | metric and alert on lack of success metrics. | spencerbrian wrote: | I do similar, it's a relatively small but important SQLite | database so every five minutes I take a backup using that method, | and check the generated SQL files into remote git repositories. | | One on GitHub which is just a snapshot, a single commit in an | otherwise empty repo, force pushed. This is for recovery | purposes, I don't need the history and would probably run afoul | of their service limits if I did so. | | And the other on Azure DevOps which has the entire commit history | for the past few years. This one is a bit trickier because the | pack files end up exhausting disk space if not cleaned up, and | garbage collection interrupts the backups. So it clones just the | latest commit (grafted), pushes the next commit, and wipes the | local repo. No idea how this looks on the remote backend but it's | still working without any size complaints, and it's good to know | there's an entire snapshotted history there if needed. As well as | being able to clone the most recent for recovery if GitHub fails. | Scaevolus wrote: | I rolled my own terrible sqlite replication system once. I used | execution tracing hooks | (https://rogerbinns.github.io/apsw/execution.html#tracing) to | record all mutation queries and their arguments sent to the | database, then periodically copied that log to a replica system | and executed the identical statements. | | It's rather primitive, but could be considered an intermediate | step between this blog's full copying and litestream's WAL-based | replication. | encoderer wrote: | If you use cron for anything important you should use Cronitor - | there's a free for life plan that can cover your SQLite backups. | | The article mentions calling a "dead man" service and that is | fine but Cronitor (built by 3 people, no VC dollars) is a proper | monitoring solution for jobs not just a "dead man" alert. Happy | monitoring | zrail wrote: | If Ben is reading the thread, tiny but important bug in the | example script. It should have set -e at the top to ensure that a | failed step will actually fail the script. Otherwise the snitch | will always get called. | | Edit: filed an issue on the docs repo: | https://github.com/benbjohnson/litestream.io/issues/54 | creativemonkeys wrote: | If I'm using "-e", I also like to use "set -o pipefail" which | will fail the script if any command in a pipeline fails too: x | | y | z, which is useful if 'y' fails but 'z' continues to | process the output and returns 0, -e won't catch that (the | example script doesn't use pipes though). | zrail wrote: | Yep, my standard bash header is set -eo pipefail with x if I | need to see what's going on. This particular script didn't | appear to be using pipes, as you say. | benbjohnson wrote: | Thanks for catching that! I fixed it up in a PR and merged in: | https://github.com/benbjohnson/litestream.io/pull/55 | davnicwil wrote: | This is so great - open source in action - I can't think of a | more apt thing to come out of this post than an on the fly | fix to the docs! | | Pun absolutely intended. | TekMol wrote: | Do not use `cp` to back up SQLite databases. It is not | transactionally safe. | | What if you disable the journal like this: | | PRAGMA journal_mode=OFF; | | Can cp be used to backup the DB then? | | And do you still have to send "COMMIT" after each query or will | each query be executed immediately then? | ralferoo wrote: | The main reason is that the backing file is random access, and | transactional correctness is ensured using various locking | mechanisms. Sqlite generally relies on fsync() ensuring that | data is flushed to disk at certain points, so conceptually, | data is written to an unused portion of the file, then fsync() | is done, then various structures are updated so that this data | is now live, leaving some unused space in the middle of the | file, and this is fysnc()'d again. Later that unused space | might be reclaimed, but only when all current readers and | writers have finished. | | If you cp the file, you might end up with different chunks from | different points in the transaction history that don't make | sense combined together. If you use ".backup", it guarantees | that the data in the copy will be consistent with the data as | it existed at one of the fsync() calls. | | Turning off the journalling will likely increase the chance | that your copy will be inconsistent, as there will be more | churn in the main data file. | [deleted] | KingOfCoders wrote: | I was amazed by the Livestream post on fly.io - it wants to | become a distributed database. This is huge. | mwint wrote: | > B-tree databases like SQLite compress well | | Curious, why is this? | tptacek wrote: | This question nerd-sniped me. I don't know either but my first | guess, supported by hex dumping a medium-sized database, is | that sqlite B-tree pages are collections of 32-bit page offsets | that the database is designed to cluster (also, just lots and | lots of zeroes). Also, the database I looked at was mostly | UTF-8 text, which is obviously compressible, but I don't think | that's what Ben meant. | benbjohnson wrote: | Litestream author here. When a page gets full then it splits in | two so each new mode has empty space for inserts and updates. | That empty space compresses really well. Also, OLTP data can | have duplication when you have a categorical column that you | haven't moved into a separate table (eg a "type" column that | has a couple unique values). | efrecon wrote: | May I chime in with a project around more or less the same idea | and use cases. https://github.com/efrecon/sqlite-backup. I am | pretty sure there are dozens of similar-but-not-quite projects on | GitHub/gitlab/etc. | davnicwil wrote: | Following the announcement of Fly.io acquiring the Litestream | project and hiring its creator Ben Johnson I've been looking into | it (and am very impressed by what I see!) | | Reading through the docs I was delighted to find this full set of | instructions including script commands for setting up a simpler | regular snapshot backup using cron - for situations where | Litestream is (quote) "overkill" - ie where the database is | smaller and your durability requirements are lower. | | In fact, the docs have a whole alternatives _section_ for | different requirements. I think this kind of thing is great, and | something more projects should do! Wanted to share it because of | this meta thing, but also because it 's just pretty useful _per | se_. | dgb23 wrote: | I agree. There are a bunch of benefits for everyone here. Users | get to have a sharper view on the intended user case of the | tool, this benefits bith sides, and learn more related bits | along the way. It also builds trust. | | The best marketing for devs is no marketing. Just being honest, | clear and helpful is what works best. | kitd wrote: | Agreed. In my mind, the fact that the developers understand | limitations in their own tools means that future enhancements | are likely to be more focussed and effective. | mritchie712 wrote: | if you run a saas, this is a good move for SEO too. People may | be searching "backup sqlite cron" or something like it and | realize they'd rather use litestream. | swah wrote: | I felt the same finding that section: by not "pushing too much | their solution" they actually increase the chances that I'll | use it someday. | mattgreenrocks wrote: | This what a mature project looks like: it knows what it is, who | it is for, and what it isn't. | | Contrast with so many open source projects that accumulate | features and promise the world, so long as you give them a | star. | vanviegen wrote: | > Contrast with so many open source projects that accumulate | features and promise the world, so long as you give them a | star. | | What a cynical take. | pierregillesl wrote: | Don't forget to stop writing to the database while doing the | backup, otherwise you can run into an infinite loop if you write | faster than sqlite3 .backup is doing the backup :D | | Learned that the hard way when implementing sqlite3 backups on | Gladys Assistant ( open-source home automation platform | https://github.com/GladysAssistant/Gladys ) | davnicwil wrote: | to be fair, this sounds like a situation where Litestream _is_ | the more appropriate solution! | simonw wrote: | This surprises me. I was under the impression that SQLite | backups run inside a dedicated transaction in order to avoid | this. | simonw wrote: | This comment cleared that up for me: VACUUM INTO backups are | transactional, but the .backup mechanism is not: | https://news.ycombinator.com/item?id=31387556 | | Ben I suggest updating that cron backups documentation page | to recommend VACUUM INTO instead! | ignoramous wrote: | Could start an issue: | https://github.com/benbjohnson/litestream.io/issues | benbjohnson wrote: | Good catch. I went ahead and added an issue on the docs. | https://github.com/benbjohnson/litestream.io/issues/56 | oefrha wrote: | See the SQLite online backup API documentation [1]. | | You can finish the backup in one step, but a read-lock would | be held during the entire duration, preventing writes. If you | do the backup several pages at a time, then | | > If another thread or process writes to the source database | while this function is sleeping, then SQLite detects this and | usually restarts the backup process when | sqlite3_backup_step() is next called. ... | | > Whether or not the backup process is restarted as a result | of writes to the source database mid-backup, the user can be | sure that when the backup operation is completed the backup | database contains a consistent and up-to-date snapshot of the | original. However: ... | | > If the backup process is restarted frequently enough it may | never run to completion and the backupDb() function may never | return. | | The CLI .backup command does non-blocking backup IIRC so is | subject to restarts. | | [1] https://www.sqlite.org/backup.html | littlecranky67 wrote: | Can you elaborate more? I think the cron-solution will be | unable to synchronize with your application code to determine | when to stop writing, so more background and your solution | would be of interest. | | As I understand it, while you do the backup, other writes | should go to the WAL log and only get commited until after the | backup? | prirun wrote: | I did some reading, and there are 2 ways to use the SQLite | backup API: | | 1. call backup_init, backup_step with a step size of -1, then | backup_finish. This will lock the db the whole time the | backup is taking place and backup the entire db. | | 2. call backup_init, backup_step in a loop until it returns | SQLITE_DONE with a positive step size indicating how many | pages to copy, then backup_finish. | | With method 2, no db lock is held between backup_step calls. | If a write occurs between backup_step calls, the backup API | automagically detects this and restarts. I don't know if it | looks at the commit count and restarts the backup from the | beginning or is smart enough to know the first changed page | and restarts from there. Because the lock is released, a | continuous stream of writes could prevent the backup from | completing. | | I looked in the sqlite3 shell command source, and it uses | method 2. So if using the .backup command with continuous | concurrent writes, you have to take a read lock on the db | before .backup to ensure it finishes. It would be nice if the | .backup command took a -step option. That would enable the -1 | step size feature of method 1. The sqlite3 shell uses a step | size of 100. | | Another option would be to check backup_remaining() and | backup_pagecount() after each step, and if the backup isn't | making progress, increase the step size. Once the step size | is equal to backup_pagecount() it will succeed, though it may | have to lock out concurrent writes for a long time on a large | db. There's really no other choice unless you get into | managing db logs. | pierregillesl wrote: | It's not what I've experienced! | | In my experience, as soon as there is some new data coming in | the DB, the .backup command will continue, and if the writes | are not stopping, the backup will never stop as well :D | | In Gladys case, we put in the application logic a blocking | transaction to lock writes during the backup. I haven't found | any other way to avoid infinite backups in case of write- | heavy databases | 1958325146 wrote: | I ran into the case of needing to back up a write-heavy | database without blocking anything, and came up with a | solution: Writing a VFS ( https://www.sqlite.org/vfs.html ) | that makes the application move the Sqlite journals file to | a directory for processing instead of deleting them. | Another process reads them to see what pages were touched | and can very quickly get the changes to update the backup. | yawaramin wrote: | I'm using VACUUM INTO, which does basically that: | https://sqlite.org/lang_vacuum.html#vacuuminto | | > The VACUUM command with an INTO clause is an alternative | to the backup API for generating backup copies of a live | database....The VACUUM INTO command is transactional in the | sense that the generated output database is a consistent | snapshot of the original database. | | EDIT: Litestream docs will also recommend that: | https://github.com/benbjohnson/litestream.io/issues/56 | Multicomp wrote: | A service that says 'here is an alternative to our service'? | Right on their own website? Even though cron backups are | basically 'competition' to litestream? | | That's it. I'm giving fly.io a try next hobby app I start. | | I'm so impressed with their business decisions right now, I was | afraid for litestream but then I read how the creator is just | hired to work on it full time. What a splendid development! | tptacek wrote: | Don't give us that much credit! Litestream is a FOSS project. | Fly.io isn't competitive about it. We just bought it because | its existence, like, in the universe makes Fly.io more | valuable, and we wanted to make sure Ben could keep working on | it. We actually already have offerings that compete with it | (Fly.io Postgres). :) | | (Ben might feel competitive about Litestream though; it's his | baby. So maybe give him some credit.) | jjice wrote: | > A service that says 'here is an alternative to our service'? | | I actually see this quite a bit. I think the reason for it is | that it's a good way to show complexity or moving parts that | can break in a process, where the marketing service is just a | drop in and start. We can see the SQLite backup process here's | and it's not too bad, but we can also see there are a few | things that can go wrong that I'm sure litestream takes care of | and would allow us to avoid any issues. | | For example, backing up the DB onto the local disk before | copying it _could_ fail due to lack of disk space, and them we | have to deal with the notification and fix for that. I 'm sure | litestream is a 10 minute setup that handles a lot of | intricacies like that. | er4hn wrote: | It's also in the same spirit as SQLite's own page on if | SQLite is a good fit: https://www.sqlite.org/whentouse.html | | There is something very powerful about knowing your strength, | leaning into it, and building on it. ___________________________________________________________________ (page generated 2022-05-15 23:00 UTC)