[HN Gopher] How to Corrupt a SQLite Database File
       ___________________________________________________________________
        
       How to Corrupt a SQLite Database File
        
       Author : pcr910303
       Score  : 278 points
       Date   : 2020-01-20 14:02 UTC (8 hours ago)
        
 (HTM) web link (www.sqlite.org)
 (TXT) w3m dump (www.sqlite.org)
        
       | laurent123456 wrote:
       | 2009 (at least) + previous discussions:
       | https://hn.algolia.com/?q=How+to+Corrupt+an+SQLite+Database+...
        
       | logicallee wrote:
       | This section was particularly interesting (I add my own thoughts
       | below it):
       | 
       | >Unfortunately, most consumer-grade mass storage devices lie
       | about syncing. Disk drives will report that content is safely on
       | persistent media as soon as it reaches the track buffer and
       | before actually being written to oxide. This makes the disk
       | drives seem to operate faster (which is vitally important to the
       | manufacturer so that they can show good benchmark numbers in
       | trade magazines). And in fairness, the lie normally causes no
       | harm, as long as there is no power loss or hard reset prior to
       | the track buffer actually being written to oxide. But if a power
       | loss or hard reset does occur, and if that results in content
       | that was written after a sync reaching oxide while content
       | written before the sync is still in a track buffer, then database
       | corruption can occur.
       | 
       | >USB flash memory sticks seem to be especially pernicious liars
       | regarding sync requests. One can easily see this by committing a
       | large transaction to an SQLite database on a USB memory stick.
       | The COMMIT command will return relatively quickly, indicating
       | that the memory stick has told the operating system and the
       | operating system has told SQLite that all content is safely in
       | persistent storage, and yet the LED on the end of the memory
       | stick will continue flashing for several more seconds. Pulling
       | out the memory stick while the LED is still flashing will
       | frequently result in database corruption.
       | 
       | -----
       | 
       | If you were a human rather than sqlite you would stop believing
       | that a sync has occurred immediately, trust it a little bit after
       | a few seconds, and if it's been a minute since you yourself last
       | touched the file, you would think "surely it has been written by
       | now".
       | 
       | So it seems by tracking wall time, the library could do better
       | than have to rely on the drive, and could be more on guard
       | against lying drives.
        
         | magila wrote:
         | The bit about disk drive lying about syncing writes is a
         | persistent myth. There have probably been some HDDs at some
         | point which had notable bugs in their flush command, but in
         | general drives do actually write data to disk before reporting
         | complete status for a flush.
         | 
         | I spent several years working for a major HDD manufacturer and
         | their firmware QA process included a test rig which would
         | repeatedly pull power to the drive while sending write and
         | flush commands. The drive would then be powered up and the data
         | on the drive examined to verify that all data which should have
         | been flushed actually made it to disk.
         | 
         | I think this myth persists mostly because it can be surprising
         | difficult to coax an OS into sending a flush command for much
         | the same reason disk drives get accused of lying about it:
         | performance.
         | 
         | These days the most reliably solution is to reconfigure the
         | drive with write caching disabled. This way the drive will
         | ensure every write hits the disk before reporting completion to
         | the host. Write caching remains enabled by default on SATA
         | drives mostly for historical reasons. Back before command
         | queuing was commonly supported write caching gave a large
         | speedup. These days everything supports NCQ so disabling write
         | caching only has a small performance impact.
        
           | xenadu02 wrote:
           | This is absolutely not a myth. There have been drives that
           | behaved this way, including from "name brand" manufacturers.
           | Maybe those were firmware bugs, maybe they stopped doing it,
           | but it is not something made up.
           | 
           | Of course there have also been controller bugs, file system
           | bugs, and so on over the years. A decade ago the behavior of
           | most file systems was quite poor in the face of underlying
           | hardware failures but a lot of work has gone into improving
           | the situation.
           | 
           | SSDs are slowing improving too. Most of them have claimed to
           | go into read-only mode when enough blocks wear out but in
           | reality most of them would crash, fail to respond, or just
           | lose data. It turns out manufacturers tend to be a bit loose
           | in their claims until independent third-parties put their
           | claims to the test. More recent models seem to be a bit
           | better behaved in failure states.
        
         | fencepost wrote:
         | I'd also wonder about how old this part is and whether it's
         | partially based on Windows. USB behavior on Windows used to be
         | optimized for speed and would do caching (hence the need to
         | eject flash drives before removal) but in one of the Win10
         | updates it was changed to safe removal which basically disables
         | write caching on removable devices. So, at least on Windows,
         | there was absolutely a time where the OS would report
         | completion before things were actually written.
        
         | kardos wrote:
         | > So it seems by tracking wall time, the library could do
         | better than have to rely on the drive, and could be more on
         | guard against lying drives.
         | 
         | Can you elaborate about tracking wall time --- how would you
         | determine the real sync timing? Picking an arbitrary large
         | delay (a minute) would trash the write performance
        
           | logicallee wrote:
           | Why would it trash the write performance to pick an
           | arbitrarily large delay, such as one minute? Can you give an
           | example? I don't know sqlite internals but this document is
           | about "corrupting" a database (not just losing some data.)
           | 
           | So I imagine it could be set up so that any power loss that
           | keeps data from being written within a specified arbitrarily
           | large delay (such as one minute) won't actually corrupt the
           | database - just lose data.
        
       | tgsovlerkhgsel wrote:
       | > Backup or restore while a transaction is active
       | 
       | Is there any information whether taking a consistent snapshot
       | (that contains both the DB file and journals) is safe?
        
         | egdod wrote:
         | Seems like that would have the same problems if you do it while
         | transactions are ongoing.
        
           | tgsovlerkhgsel wrote:
           | Given that the DB is designed to survive power outages (even
           | during transactions), I don't really understand how that
           | would be the case.
        
           | dunham wrote:
           | I believe they are subtly different.
           | 
           | If you copy while transactions are ongoing, you're reading
           | different parts of a file at different points in time. For
           | example, the beginning of the sqlite3 has already been
           | copied, but by the time you get to the log file, blocks have
           | been committed back into the beginning of the sqlite file.
           | 
           | If you take a consistent snapshot of the filesystem first, I
           | would think that would be the same as a crash in the middle
           | of a transaction. Things should be in a consistent state that
           | the recovery code at startup can handle.
        
         | pstuart wrote:
         | It's recommended to use the backup api vs a filesystem copy:
         | https://www.sqlite.org/backup.html
        
         | bob1029 wrote:
         | If you are using WAL, I feel volume-level snapshots are an
         | acceptable approach.
        
         | [deleted]
        
       | Const-me wrote:
       | > Backup or restore while a transaction is active
       | 
       | I wonder does that apply to Windows?
       | 
       | Well-designed Windows backup software uses shadow copy NTFS
       | feature. Shadow copy creates a snapshot of the complete volume
       | when backup starts. All writes happening after that won't be
       | included in the shadow copy, so the backup software should never
       | read some old and some new content.
       | 
       | Does it mean when using VSS on Windows, SQLite databases will be
       | backed up just fine? Or am I missing something?
        
         | DenisM wrote:
         | You're probably right. Taking a VSS snapshot is identical to
         | pulling the power plug and imaging the drive. If SQLite hand
         | handle the latter it should be able to handle the former.
         | 
         | Another thing that should work well is an AWS EC2 EBS snapshot
         | as Amazon promises a consistent point in time copy.
        
       | cpuguy83 wrote:
       | In another life (2007) I worked for a company that does digital
       | signage. We had about 8000 systems deployed across the US. The
       | software on them relied heavily on SQLite. At some point we
       | started seeing random corrupted SQLite databases.
       | 
       | Trying to figure out what was causing this we realized our pc
       | manufacturer was using the same sku but hardware inside could
       | vary... and in particular the hard drive. Every system with a
       | corrupt DB had a particular brand hard drive (WD), the ones that
       | didn't fail was another (Seagate).
       | 
       | We never really spent much more time on this other than to lock
       | down the hardware config and replace all the hard drives in the
       | failed systems with the "good" brand.
       | 
       | Super interesting to see this. I'm assuming it was: Disk drives
       | that do not honor sync requests
       | 
       | We were of course using consumer grade equipment, because the
       | company was pretty cheap, but also we had so many to deploy. I
       | wonder how much this cost the company to fully resolve. Between
       | systems not playing, having to send technicians, shipping systems
       | overnight, time troubleshooting, etc.
        
         | magila wrote:
         | Contrary to popular belief, even consumer drives will honor
         | flush commands. The problem is operating systems generally do
         | not send flush commands, even when you might expect it (e.g.
         | last I checked fsync() on Linux does not send a flush).
         | 
         | Disabling write caching resolves these problems. Unfortunately
         | SATA drives typically ship with write caching enabled by
         | default.
        
           | zaarn wrote:
           | Some cheaper HDDs and third-rate drives won't honor flushes.
           | Same for SSDs, a few very cheap ones ignore flushes in favor
           | of cache performance.
        
           | tobias3 wrote:
           | Of course fsync() sends a flush to the disk (unless you use
           | non-standard mount options or "eatmydata"). They may not
           | honor the FUA bit, though, but hopefully nobody relies on
           | this if they expect it to work with consumer disks (See e.g.
           | here https://devblogs.microsoft.com/oldnewthing/20170510-00/?
           | p=95... ).
           | 
           | What I have heard is that some disks have problems if they
           | are on a error recovery path. E.g. they have a corrupt sector
           | or a sector that needs reallocation (pending sector), and
           | then they have a power failure while the sector is
           | reallocated.
        
           | networkimprov wrote:
           | References?
        
             | aidos wrote:
             | Not quite the same thing but it turns out there are some
             | other gotchas with fsync on Linux
             | https://lwn.net/Articles/752063/
        
         | wongarsu wrote:
         | > because the company was pretty cheap, but also we had so many
         | to deploy
         | 
         | Isn't "deploy many systems in hard to reach places" the prime
         | target for enterprise-grade hardware? Fixing problems on one
         | bespoke system is a lot easier than fixing the same problem
         | 8000 times, and of course with a large sample size the number
         | of problems experienced goes up.
        
           | michaelt wrote:
           | I always thought "people willing to pay five to ten times
           | market rate per gigabyte" was the target of enterprise-grade
           | storage...
        
           | cpuguy83 wrote:
           | Yep. This was a big problem there. Before they were
           | delivering content to PC's they distributed content via DVD
           | once per month. DVD player failure rate was pretty massive,
           | especially since they just started recycling refurbed
           | players.
        
         | FpUser wrote:
         | In my other life (2000-2008 ish I think) I've also dealt with
         | the large (tens of thousands) deployments all consumer grade as
         | in your case. No SQLite though. Obviously we've had some hard
         | drive failures. What is very interesting is that the results
         | were exactly opposite. WD drives lived and Seagate drives
         | failed. I am wondering if such discrepancy was caused by the
         | usage patter.
        
           | winrid wrote:
           | The drives weren't actually failing, the setup failed due to
           | how the WD drives work. If you reformatted them they'd be
           | fine.
        
             | FpUser wrote:
             | Oops, sorry was not careful reading your message
        
             | cpuguy83 wrote:
             | Yep, exactly this. The drives were fine, only the database
             | was getting corrupted.
        
         | pilsetnieks wrote:
         | With how widespread both SQLite and WD are wouldn't half the
         | world have apps crashing left and right, if just the hard drive
         | brand was the cause of the problem?
        
           | wolf550e wrote:
           | The case of the disks not honoring fsync is famous. I learned
           | about it from Brad Fitzpatrick, author of LiveJournal (later
           | on the Go team at Google) writing about it in 2005:
           | https://brad.livejournal.com/2116715.html
        
           | inetknght wrote:
           | See also: counterfeit SD cards.
        
           | nothrabannosir wrote:
           | I had consistent problems with WD in that same era. Switched
           | to Seagate as well, and my problems disappeared. Just another
           | datapoint.
        
           | fpgaminer wrote:
           | GP mentioned digital signage, so probably some displays at a
           | business. Unlike a computer which would mostly either be left
           | on (asleep) or shut down properly, I'm sure users had no
           | qualms about killing power to these displays as they closed
           | up every night. Write flushing problems would be exacerbated
           | in that environment.
        
           | jmull wrote:
           | Yes, I think it's possible (likely, even) that there was at
           | least one more factor.
        
           | adrianN wrote:
           | Apps are in fact crashing left and right.
        
             | brewdad wrote:
             | My Materialistic app crashed about 30 seconds prior to
             | opening this thread. I probably average about 5 "noisy" app
             | crashes a week on my phone. No idea how many may be failing
             | silently.
        
               | ShamelessC wrote:
               | I had to give up on Materialistic because of the
               | frequency of crashes. It's a shame because the hn website
               | is really difficult to navigate on mobile (collapsing
               | threads with that tiny + is a nightmare). Do you know of
               | any better alternatives that let you log in and post from
               | the app?
        
           | tyingq wrote:
           | It's somewhat implied it was one specific WD consumer drive
           | model.
        
           | cpuguy83 wrote:
           | I doubt most consumer applications write to the database
           | nearly as often as we were. Also very likely a particular
           | model of the drive and also sheer luck that one worked and
           | one did not.
        
       | kbumsik wrote:
       | Can they be applied to other DBMSs? I mean, I don't think other
       | DBMSs treat database files too differently from SQLite, don't
       | they?
        
         | pjc50 wrote:
         | Most DBMS don't let you at the files directly - you speak to
         | the DB over a socket. Because SQLite is in-process, bugs in the
         | process that access random file descriptors can write over the
         | SQLite files.
         | 
         | (This is more ammunition for the idea that the real software
         | isolation boundary on a desktop computer should not be "user",
         | but "software author"!)
        
         | Xophmeister wrote:
         | SQLite is serverless; server-based DBMSs are free to use other
         | locking mechanisms, other than relying on filesystem
         | primitives, to achieve concurrency.
        
         | simias wrote:
         | Most of them would be possible in some variation, although
         | typically with different results.
         | 
         | For instance typically databases are accessed through a socket
         | interface which uses the same pool of file descriptors as
         | open(). So it's probably possible for a database connection to
         | end up with fd 2 and then the write(2, ...) would also send
         | garbage to the database.
         | 
         | Although in this case it probably won't do much unless you're
         | very unlucky because the message will almost certainly not end
         | up being valid for the DB so it'll either ignore it, return an
         | error or maybe drop the connection without corruption. So it
         | would still be a nasty thing to debug but not quite as
         | critical.
         | 
         | More generally there's typically a much better isolation
         | between database and process on traditional processes which
         | makes it difficult for a misbehaving program to mess directly
         | with the DB. And having a server program processing the
         | transactions means that you don't have to rely so heavily on
         | things like filesystem locking to achieve atomicity for
         | instance. That being said buggy or limited filesystems can be a
         | problem for any database, for instance to guarantee that a
         | transaction has been really commited to disk in case of a power
         | outage etc... See this for instance:
         | 
         | https://wiki.postgresql.org/wiki/Fsync_Errors
         | 
         | https://www.postgresql.org/message-id/flat/CAMsr%2BYHh%2B5Oq...
        
       | arminiusreturns wrote:
       | I wonder if dqlite (https://github.com/canonical/dqlite) or
       | rqlite (https://github.com/rqlite/rqlite) are enough to help
       | prevent this.
        
         | samatman wrote:
         | No, but they should be able to _recover_ from it, which is
         | almost as good.
        
       | Lex-2008 wrote:
       | Very interesting read! Although some points might sound too
       | strange to include in this list (like "memory corruption" - in
       | case of faulty memory, everything is in danger, not only
       | database) - some worth keeping in mind (multiple hardlinks to the
       | file, moving a single DB file around without accompanying *-wal
       | file, etc)
        
         | matsemann wrote:
         | > strange to include in this list (like "memory corruption" -
         | in case of faulty memory, everything is in danger, not only
         | database)
         | 
         | But theoretically one could build something that handles up to
         | X random bit flips, so it's maybe worth mentioning that sqlite
         | doesn't handle this (even though probably no one else does as
         | well)
        
           | Franciscouzo wrote:
           | Could you? You would also have to take into account that the
           | instruction that make the program could also be randomly
           | flipped
        
             | blattimwind wrote:
             | In most cases the size of the code is much smaller than the
             | data handled, so for random errors (perfectly working DRAM
             | has a bit error rate somewhere in the general vicinity of
             | 10^-14/bit*hr) the distribution of errors is in accordance
             | to that proportion.
        
         | mytailorisrich wrote:
         | They are not describing cases of faulty memory but explicitly,
         | and very aptly, warn you that since SQLite is a C library that
         | runs in the same address space as the application code, if your
         | application code corrupts memory (by way of buffer overrun,
         | heap corruption, etc) it can impact SQLite's internals and in
         | turn corrupts the SQLite database file.
        
       | andreareina wrote:
       | Is open() on something other than /dev/stderr returning 2 a
       | thing? Or is there something screwy that was causing it to get
       | associated with the open database file?
        
         | emilfihlman wrote:
         | The kernel gives you the lowest numbered fd available.
         | 
         | If you close stderr, 2 becomes available and can be handed out.
         | 
         | Also it's not exactly guaranteed that 0, 1 and 2 are the fd
         | numbers for stdin, out and err on program start, but they are
         | hard coded and standard.
         | 
         | E: It's implementation defined, ie not guaranteed. Your
         | platform may make guarantees about it, but it is not an actual
         | standard.
        
           | Hello71 wrote:
           | It is not only guaranteed by the POSIX standard, as pointed
           | out by the siblings, it is necessary for standard shell
           | scripting idioms to work, e.g. "command1 2>&1 | command2".
        
           | matheusmoreira wrote:
           | http://man7.org/linux/man-pages/man3/stdin.3.html
           | 
           | > On program startup, the integer file descriptors associated
           | with the streams stdin, stdout, and stderr are 0, 1, and 2,
           | respectively.
        
             | quindecagon wrote:
             | > On program startup
             | 
             | Then, after program startup, if it is closed, then fd 2
             | could be assigned to something else.
        
             | emilfihlman wrote:
             | You must differentiate between a guarantee by an actual
             | standard and platform specific documentation.
        
               | matheusmoreira wrote:
               | https://pubs.opengroup.org/onlinepubs/9699919799/function
               | s/s...
               | 
               | > The following symbolic values in _< unistd.h>_ define
               | the file descriptors that shall be associated with the
               | C-language _stdin_ , _stdout_ , and _stderr_ when the
               | application is started:
               | 
               | > STDIN_FILENO Standard input value, _stdin_. Its value
               | is 0.
               | 
               | > STDOUT_FILENO Standard output value, _stdout_. Its
               | value is 1.
               | 
               | > STDERR_FILENO Standard error value, _stderr_. Its value
               | is 2.
        
               | [deleted]
        
           | blattimwind wrote:
           | > Also it's not exactly guaranteed that 0, 1 and 2 are the fd
           | numbers for stdin, out and err on program start
           | 
           | Yes, it is.
           | 
           | https://pubs.opengroup.org/onlinepubs/9699919799/functions/s.
           | ..
        
             | emilfihlman wrote:
             | You must differentiate between a guarantee by an actual
             | standard and platform specific documentation.
        
               | endorphone wrote:
               | He linked an actual standard.
        
               | blattimwind wrote:
               | Can you tell me what the "std" in "IEEE Std 1003.1-2017"
               | stands for?
        
               | makomk wrote:
               | That is a guarantee by an actual standard. More
               | specifically, it's one of the things that POSIX.1-2017
               | guarantees but the ISO C standard does not. In practical
               | terms this means that all Unix-like systems use the
               | traditional fd numbers for stdin, stdout and stderr but
               | it's possible that non-Unix systems might not.
        
             | trasz wrote:
             | You're right, but it's more than that: stderr is defined to
             | be fd 2. Whatever is assigned to that fd, is the stderr. If
             | you close it and open something else, that something
             | becomes your stderr.
        
       | justinator wrote:
       | I must be getting old - an article with such a title would used
       | to also be appended with, "For Fun, and Profit!"
        
         | OJFord wrote:
         | These days that's not enough; most come with 6 Reasons, and
         | often one of them is particularly Unbelievable.
        
         | anon73044 wrote:
         | I would have expected that if this were posted on milw0rm or a
         | personal blog, but not on the actual SQLite info page.
        
           | justinator wrote:
           | I honestly feel it's more a sea change type of thing,
           | 
           | https://mitpress.mit.edu/books/fun-and-profit
        
       | dicytea wrote:
       | Any good introduction to sqlite3 for a Python user?
        
       | umvi wrote:
       | Somewhat unrelated, I feel like the stars aligned to prevent my
       | company from using SQLite.
       | 
       | We use a lot of CSV files in our embedded devices. I personally
       | hate CSV and advocated for SQLite. However, our CSVs are 1
       | writer/many readers, and the only way to do that in SQLite is to
       | change the mode to WAL. However, apparently WAL doesn't work on
       | JFFS2 filesystems :(...
       | 
       | I'm going to give it another shot on our next-gen product which
       | uses ubifs
        
       | wolco wrote:
       | The most common by far.
       | 
       | I/O error while obtaining a lock leads to corruption. Probably
       | speaks to my drive/os choices.
        
       | tracker1 wrote:
       | Using a mounted volume in docker for windows/mac desktop and
       | inside container to edit the same database.
       | 
       | Don't do this btw, the FS sync in docker for windows/mac is
       | really slow and prone to causing corruption and errors if you do.
        
       | bob1029 wrote:
       | I have found that corruption of SQLite files in practice is a
       | very difficult thing to achieve. On Windows, the worst I've ever
       | seen is a wall of "Database File is Locked" which was resolved
       | with a rewrite of the DB access logic. After much
       | experimentation, I have found that the happy path for SQLite is
       | as follows:
       | 
       | 1. Use a SQLite distribution that is compiled or otherwise
       | configured for serialized threading mode (this is the default in
       | most cases).
       | 
       | 2. Use a single connection from a single process to access the
       | SQLite database file. This connection can be safely shared
       | between multiple threads with optional locking as required (due
       | to the guarantees offered by #1 above).
       | 
       | 3. Transactions are handled with locking in the application code
       | (i.e. on the SQLiteConnection instance), not in the database.
       | This is predominantly required for inserts with PK generation,
       | but also applicable in cases where you don't want 2 different
       | threads to interleave updates during a certain multi-step
       | operation (e.g. debiting one account and crediting another).
       | 
       | So far, we have found this to be the most performant and simple
       | way to get business data in and out of a SQLite database file.
       | The "one connection per operation" approach is NOT ideal for
       | SQLite because there is an actual database file that needs a
       | handle created/closed each time. One connection per makes sense
       | when you are using a hosted database designed for this access
       | model.
        
         | xienze wrote:
         | > I have found that corruption of SQLite files in practice is a
         | very difficult thing to achieve.
         | 
         | Not in the slightest. Stick your Plex DB on NFS and it'll get
         | messed up in very short order.
        
           | justinclift wrote:
           | Wonder if it changing some of the NFS mount options (caching,
           | sync, etc) would make it better?
           | 
           | And/or options known to cause corruption for this Plex DB
           | usage pattern. eg: "Don't use options foo, bar, bazz with
           | PlexDB over NFS, as corruption _will_ occur ".
        
           | ta999999171 wrote:
           | People who know what NFS is will be on Emby or similar before
           | I finish writing this comment.
        
         | catalogia wrote:
         | I've found WAL mode with one writer thread and as many reader
         | threads as you want works pretty well.
        
           | bob1029 wrote:
           | Yes, absolutely use WAL mode whenever feasible. Does wonders
           | for performance.
        
         | harikb wrote:
         | If you have to handle (write code for) 2 and 3, why isn't
         | something like Postgres better?
        
           | cheez wrote:
           | It's a good question. For me, lack of yet-another-service.
        
             | bob1029 wrote:
             | This is my primary answer as well. I have no problem with
             | Postgres as a solution, but I will only invite additional
             | complexity into my life when absolutely required by the
             | business constraints.
             | 
             | SQLite is dead simple compared to Postgres (or any other
             | hosted solution). I can copy a SQLite .db file from a
             | production machine to my local box and be reviewing
             | rows/running queries within seconds. Also, sharing SQLite
             | databases with other users & developers or even using them
             | as a contract between systems is a very compelling set of
             | use cases. Finally, going from a blank windows machine to a
             | working development (or production) environment is very
             | quick if your application exclusively uses SQLite to store
             | its data. You simply stop thinking about databases after a
             | while because everything just works.
        
               | jbverschoor wrote:
               | For me, the single file is the nr 1 reason. It sluist
               | works, the file is the data.. very clear.
        
           | barrkel wrote:
           | Something like Firebird Embedded would be better than Sqlite
           | if you have concurrency and performance requirements
           | alongside an embedded packaging constraint.
           | 
           | Sqlite is more like a structured persistent file format with
           | a SQL interface than a database.
        
           | mceachen wrote:
           | I'd love to use Postgres with PhotoStructure, but it would
           | triple the installation size (!!), make my code more
           | complicated (I'd basically need to automate common dba work),
           | and Windows support is shaky (last time I checked, at least).
           | 
           | I picked SQLite to reduce moving parts.
        
           | ebg13 wrote:
           | I can answer this. Adding a queue thread to your executable
           | is infinitely easier to redistribute and run than something
           | that requires the user to install a persistent postgres
           | daemon. There are premade libraries for several languages
           | that provide thread safe query queueing for sqlite
           | transparently as drop in imports so you don't even really
           | need to think about it. See e.g.
           | https://github.com/dashawn888/sqlite3worker (not mine)
        
         | blattimwind wrote:
         | In WAL mode you can write from multiple threads without any
         | problem, but you need to keep serializability in mind.
         | Specifically, no concurrent write transactions are allowed. In
         | other words, you can't have a transaction start at X, have
         | another transaction start at X+1, issue DML and commit, and
         | expect your original transaction from X being able to commit -
         | it will never be able to.
         | 
         | Some SQLite wrappers (notably, Python's) mess with the start
         | and rollbacks of transactions and make this appear buggy.
        
           | platz wrote:
           | Even in WAL mode, you can still get a SQLITE_BUSY error even
           | when just trying to open a new connection. The application I
           | worked on could not tolerate these busy exceptions so we had
           | to acquire an exclusive global mutex before attempting to
           | open new connections to sqlite and make sure to release it at
           | soon as the operation was done.
        
           | stephen82 wrote:
           | They are working on it; please check for 'begin-concurrent'
           | and 'wal2' branch names here
           | https://www.sqlite.org/cgi/src/brlist
        
           | oefrha wrote:
           | > Some SQLite wrappers (notably, Python's) mess with the
           | start and rollbacks of transactions and make this appear
           | buggy.
           | 
           | @coleifer (author of peewee) has a pretty informative blog
           | post exploring -- in part -- transactional semantics, which
           | might be of value to people who want to know more about this.
           | 
           | https://charlesleifer.com/blog/going-fast-with-sqlite-and-
           | py...
        
             | mox1 wrote:
             | +1 for @coleifer, peewee is an awesome ORM, he knows his DB
             | stuff very well.
        
           | kccqzy wrote:
           | Python's wrapper is one of the worst. It attempts to parse
           | the SQL you are trying to execute and automatically send
           | BEGIN or COMMIT to the database. Setting the
           | "isolation_level" property (badly named) to None makes things
           | better.
        
         | tracker1 wrote:
         | Posted in another thread... but I discovered how the FS in
         | Docker for (Windows/Mac) worked the hard way. If you use a
         | mounted volume in Win/Mac it does a syncing, not a direct
         | mount/share. This lags and isn't a great option for live
         | databases and if you're accessing the same SQLite db outside
         | the container and inside, you will see corruption happen.
         | 
         | In any case, it was definitely an interesting discovery...
         | since then, I only use shared volumes with the host as
         | backup/restore targets, never for shared anything.
        
           | zeeZ wrote:
           | I run Plex on my Windows desktop as a container, with
           | everything I want to persist mounted in (this includes the
           | database). If I don't stop the container before hibernating
           | my PC, I'm pretty much guaranteed to have some kind of
           | database corruption on the next start.
        
             | tracker1 wrote:
             | I'd consider a volume container for your database, the
             | content is probably safe enough, since it's not a lot of
             | heavy write. Then have a timed event that copies the DB out
             | of the volume container for backups regularly.
        
           | tracker1 wrote:
           | It's also worth noting, that using a DB in a shared volume,
           | even if you don't touch it outside the container, is _MUCH_
           | slower... I was doing this for mssql containers and when I
           | stopped, my spinup for a fresh instance went from nearly 90
           | seconds to about 15.
        
       | adrianmonk wrote:
       | > _Systems that run automatic backups in the background might try
       | to make a backup copy of an SQLite database file while it is in
       | the middle of a transaction. The backup copy then might contain
       | some old and some new content, and thus be corrupt._
       | 
       | I am surprised by this! Is it really not entirely safe to take a
       | backup while some software is running, has a SQLite database
       | open, and might be writing to it?
       | 
       | That limitation is fine for a big database that has someone
       | knowledgeable administering it who will take care to follow any
       | required backup procedures. But SQLite is often included in
       | software that nontechnical end users will use without even
       | knowing that there's a SQLite file. For example, Firefox and
       | Google Chrome use it.
       | 
       | This seems to mean that if a regular end user is browsing the web
       | while backups run on their computer, it's possible they could be
       | unlucky and get a bad backup that would corrupt the database if
       | they had to restore it.
       | 
       | Am I interpreting that right? If so, it suggests that many/most
       | software projects that include SQLite are doing it in a slightly
       | unsafe way.
       | 
       | (It seems possible that the _-journal and_ -wal files could
       | protect against that, but this SQLite documentation only mentions
       | that they protect you when transactions fail, which is a
       | different case.)
        
         | btilly wrote:
         | Let's turn this around. How would you avoid this?
         | 
         | Suppose that you are copying the file front to back. While you
         | are doing this, an update comes in that changes something in
         | the front and back which refer to each other, while your copy
         | is somewhere in the middle.
         | 
         | The result is that you get a backup with a data structure in
         | the back of the file referring to stuff in the front of the
         | file that isn't there. Your backup copy is therefore corrupted.
         | 
         | This problem is almost impossible to avoid. You just do the
         | best that you can.
        
         | GlitchMr wrote:
         | Yes. However, keep in mind it's not just SQLite, but pretty
         | much every program that has any kind of state.
         | 
         | For instance, a program may save its state into a file every
         | time it changes by truncating the file and then writing new
         | contents. If an user were to copy the file after truncation,
         | but before write, the user would end up with an useless backup.
         | 
         | That said, this is rather unlikely to happen in practice.
         | Programs usually do stuff (including saving files) when they
         | are interacted with, so a backup copied with File Explorer has
         | a very high probability of working, even when the application
         | is running.
         | 
         | Programs can deal with the issue by performing atomic full
         | writes by performing renames, but most programs don't bother
         | to. SQLite cannot do that, because it cannot afford doing full
         | writes as databases tend to store a lot of data.
        
           | Someone wrote:
           | The better way of 'overwriting' is to write the new file
           | under a new name, and then atomically move it to where the
           | old file was. If you have to overwrite (e.g. because of lack
           | of disk space), the program that rewrites that file should
           | open it in exclusive mode.
           | 
           | A back-up tool should open files to backup in a mode that
           | disallows others from writing to it, and retry that a few
           | times, each time after waiting a few seconds.
           | 
           | If that keeps failing, there's a problem. Do you skip backing
           | up the file or retry With a "it's fine if others are reading"
           | mode, with the risk of getting a hybrid of multiple versions?
           | There is no fully correct answer there.
           | 
           | And it gets worse. Not all programs write files in exclusive
           | mode. Some might even open a file multiple times while
           | rewriting it. Also, consistency may work cross-files, for
           | example with Mac OS packages.
           | 
           | And then, of course, there's anti-virus tools. They often
           | work below the file system, reading and writing files without
           | the OS being aware of it.
           | 
           | The reason that backup tools work relatively reliable and
           | speedy at all mostly is that most programs that write to
           | files will either regularly close them, or just append to
           | them.
           | 
           | Databases are the exception. They tend to keep files
           | permanently open and regularly write to them. If you use
           | SQLite in a 'normal' application for user data, you should
           | flush data after (about) every write (if you use it for
           | caching, you can be a bit more loose, as its relatively fine
           | to discard the database when it gets corrupted.
        
             | justinclift wrote:
             | > A back-up tool should open files to backup in a mode that
             | disallows others from writing to it, and retry that a few
             | times, each time after waiting a few seconds.
             | 
             | Generally the preferred approach for backup programs is to
             | trigger a file system snapshot - after quiescing the disks
             | - and back that up. Sometimes (eg databases), the backup
             | program is able to run application specific logic that
             | works in with this (eg flush all database buffers before
             | the snapshot).
             | 
             | For the filesystems that don't offer snapshots, some still
             | have the ability to freeze the file system to allow a
             | consistent backup.
        
             | GlitchMr wrote:
             | As I said in a comment you responded to.
             | 
             | > Programs can deal with the issue by performing atomic
             | full writes by performing renames, but most programs don't
             | bother to. SQLite cannot do that, because it cannot afford
             | doing full writes as databases tend to store a lot of data.
             | 
             | I'm aware you can do that, but most programs aren't. If you
             | pick any random program, chances are it won't do that.
        
           | adrianmonk wrote:
           | Writing a copy and renaming works, but is it the only way, or
           | is it just an approach that a lot of software uses because
           | it's simple and it works? In other words, are there ways that
           | might be more work to implement but also more efficient that
           | would be suitable for a database?
           | 
           | I agree the risk is pretty small in practice. Though copying
           | with File Explorer isn't the use case I had in mind. It was
           | automatic scheduled backups. Both Windows and macOS make it
           | pretty easy to set these up with an external hard drive. And
           | those could start running while you're using the computer
           | and/or you could start using the computer while backups are
           | running.
        
             | GlitchMr wrote:
             | Back-up software can use NTFS/ReFS Shadow Copy in Windows.
             | This shouldn't cause issues with SQLite.
             | 
             | Copying files is only an issue for manual back-ups done
             | without special software.
        
             | lolc wrote:
             | An append-only format should work fine with most backup
             | software that copies files. A vacuum phase would still have
             | to do an atomic rename.
        
         | gwbas1c wrote:
         | >> Systems that run automatic backups in the background might
         | try to make a backup copy of an SQLite database file while it
         | is in the middle of a transaction. The backup copy then might
         | contain some old and some new content, and thus be corrupt.
         | 
         | > I am surprised by this! Is it really not entirely safe to
         | take a backup while some software is running, has a SQLite
         | database open, and might be writing to it?
         | 
         | I've written backup software: You only know if another process
         | has an open file handle if you write your own driver. (Or
         | integrate with 3rd party drivers, like ELDOS.)
         | 
         | In most cases, writing a driver is overkill, so you have to
         | accept that you might read a file mid-write. In general, the
         | other process should lock the file, but there's no way to force
         | every programmer in the world to write their software to be
         | convenient for you.
        
         | lolc wrote:
         | Yes it is unsafe yet everybody does it. As long as you accept
         | that some backups may be corrupted, it's fine. You'll still
         | have older backups to go to. Of course, you have to notice the
         | corruption in the first place!
         | 
         | On a snapshotting system such as ZFS it should be fine because
         | there won't be a race between updates and copying the WAL file
         | and the DB file. (Haven't investigated it though.)
        
         | wongarsu wrote:
         | That depends on your backup solution. Naively copying or
         | rsyncing files can lead to corruption for a large number of
         | applications, including anything using SQLite. With OS or file
         | system support you can make a snapshot of a point in time and
         | copy that. Windows backup for example uses VSS (Volume Shadow
         | Copies [1])) which should not suffer from this problem. On
         | linux you can make ZFS clones or BTRFS snapshots, backup those
         | and then delete them locally.
         | 
         | 1: https://en.wikipedia.org/wiki/Shadow_Copy
        
       | jhatemyjob wrote:
       | > an SQLite
       | 
       | Does this settle it, then? This implies it's pronounced ess-
       | queue-lite. I personally prefer the sequel-lite pronunciation,
       | but if the developers use the former, then I guess I'll do that.
        
         | threatripper wrote:
         | Are you from the 70s and still use SEQUEL? Then you pronounce
         | it sequel.
         | 
         | If instead you are a modern hu-man who uses SQL, you pronounce
         | it ess-qu-ell.
        
       | silvestrov wrote:
       | > Continuing to use a file descriptor after it has been closed
       | 
       | It would be nice if file descriptors on 64-bit platforms used
       | tagging so the full value could never be reused.
       | 
       | E.g. lower 32 bits is the normal fd and the 32-bit is a serial
       | number (or random value). When using the fd the kernel can use
       | the lower 32-bit as index into the table and check the upper 32
       | bits are correct.
        
         | fanf2 wrote:
         | Sadly that isn't allowed by POSIX
         | 
         | https://pubs.opengroup.org/onlinepubs/9699919799.2018edition...
         | 
         | > _2.14. File Descriptor Allocation_
         | 
         | > _All functions that open one or more file descriptors shall,
         | unless specified otherwise, atomically allocate the lowest
         | numbered available file descriptor_
        
           | codetrotter wrote:
           | Would badness ensue if we allowed ourselves to define all
           | previously used fds as "unavailable", and thus conforming to
           | the rules as stated when we increment our serial number
           | before handing out the next fd? If yes, what sorts of badness
           | and why?
        
             | samatman wrote:
             | It strikes me that you'd use up STDIN, STDOUT, and STDERR
             | pretty fast that way.
             | 
             | One could make an unprincipled exception for them, of
             | course. And SQLITE would still need the special-casing to
             | prevent using low-numbered fds. But this would prevent
             | other problems and is worth considering.
        
               | mras0 wrote:
               | Didn't see your reply before commenting, but it gave me a
               | thought: We really only need a "local" solution. If there
               | were a way of requesting fd's that weren't required to be
               | the lowest available ones (perhaps even 64-bit ones with
               | a process unique uppper part), that'd probably be good
               | enough here. Of course we're still sharing address space
               | with buggy code, so maybe it's not worth it.
        
               | codetrotter wrote:
               | I'll reply here since then both you and parent can see it
               | and comment further.
               | 
               | In the case that you linked to in the sibling comment
               | that you wrote in reply to me, the dup2 syscall is used.
               | 
               | So the question then that I have is, in the case of
               | foo 2&>1
               | 
               | Does bash then:
               | 
               | a) First close fd 2, and then call dup(1), or does it
               | 
               | b) Call dup2(1, 2), or
               | 
               | c) Do something else entirely, or something more complex
               | or convoluted?
               | 
               | In case the answer is b) then it seems to me, though I
               | may be overlooking something, that we could allow calls
               | like b) and inside of the dup2 syscall keep the id of the
               | fd that we are using as newfd, since it is unambiguously
               | requested that this is intended, while at the same never
               | reusing fd id in the dup syscall.
               | 
               | In this comment I am referring to the dup and dup2
               | syscalls as defined in the Linux man pages:
               | 
               | dup() uses the lowest-numbered unused descriptor for the
               | new descriptor.
               | 
               | dup2() makes newfd be the copy of oldfd, closing newfd
               | first if necessary, but note the following: [...]
               | 
               | https://linux.die.net/man/2/dup2
               | 
               | With the modification that "lowest-numbered unused
               | descriptor" would be unique as previously stated in the
               | comment that the two of you are replying to.
               | 
               | More generally, the question is, does any wide-spread
               | piece of code manually close an fd and then call the dup
               | syscall, and expect the fd id to be exactly the same as
               | the one they themselves closed, instead of using dup2?
               | 
               | And furthermore, likewise, does any widespread piece of
               | code use the dup2 syscall to specify the newfd where
               | actually it should not have cared about the id of the
               | newfd and should have used dup instead and let the system
               | decide?
        
               | mras0 wrote:
               | I don't think it matters what bash actually does, the
               | issue (as I see it) is that when a program calls write(N,
               | ...) it sometimes _wants_ to refer to the  "stale" file
               | descriptor N. Consider this contrived example:
               | write(STDERR_FILENO,"test",4); // OK, write to normal
               | stderr         close(STDERR_FILENO);         open(....);
               | // Open some log file         // later
               | write(STDERR_FILENO,"x",1); // OK, write to log file
               | 
               | Even though fd 2 refers to different files I think the
               | above is required to work for POSIX compliance.
        
             | mras0 wrote:
             | I think quite a bit of software actually relies on closing
             | and re-opening file descriptors, especially
             | stdin/stdout/stderr just before/after forking. This was the
             | first example I found via a quick web search:
             | https://unix.stackexchange.com/a/148410 but it suggest that
             | e.g. bash would break :)
             | 
             | It might be possible with some hacks/heuristics to catch
             | many errors (perhaps one could create a valgrind
             | tool/santizier), but I suspect it's not possible in general
             | since there's no way of knowing if the call to
             | write(2,....) meant the old stderr or the new one.
        
       | drderidder wrote:
       | It doesn't get more than a passing mention, but section 2.1 is
       | worth noting. Don't try to use sqlite files on an NFS mounted
       | drive.
        
       | endorphone wrote:
       | Related question - does SQLite checksum blocks (pages) of data?
       | Of course the filesystem usually does, and the physical device
       | itself does, but in the scenario where the OS somehow delivered
       | corrupted data would SQLite at a minimum detect it?
        
         | Liquid_Fire wrote:
         | > Of course the filesystem usually does
         | 
         | Does it really? Certainly a few filesystems like ZFS have
         | features like that, but AFAIK none of the "mainstream" ones do
         | (the things actually running on billions of devices, like ext4,
         | NTFS, etc)
        
         | blattimwind wrote:
         | > does SQLite checksum blocks (pages) of data?
         | 
         | No. Postgres has an off-by-default option for this. The journal
         | does, though.
         | 
         | > Of course the filesystem usually does
         | 
         | Nope. Data checksumming is usually only employed by CoW file
         | systems, because it requires data journaling for a regular file
         | system, which generally isn't used or not supported at all.
        
       ___________________________________________________________________
       (page generated 2020-01-20 23:00 UTC)