[HN Gopher] How to Corrupt a SQLite Database File ___________________________________________________________________ How to Corrupt a SQLite Database File Author : pcr910303 Score : 278 points Date : 2020-01-20 14:02 UTC (8 hours ago) (HTM) web link (www.sqlite.org) (TXT) w3m dump (www.sqlite.org) | laurent123456 wrote: | 2009 (at least) + previous discussions: | https://hn.algolia.com/?q=How+to+Corrupt+an+SQLite+Database+... | logicallee wrote: | This section was particularly interesting (I add my own thoughts | below it): | | >Unfortunately, most consumer-grade mass storage devices lie | about syncing. Disk drives will report that content is safely on | persistent media as soon as it reaches the track buffer and | before actually being written to oxide. This makes the disk | drives seem to operate faster (which is vitally important to the | manufacturer so that they can show good benchmark numbers in | trade magazines). And in fairness, the lie normally causes no | harm, as long as there is no power loss or hard reset prior to | the track buffer actually being written to oxide. But if a power | loss or hard reset does occur, and if that results in content | that was written after a sync reaching oxide while content | written before the sync is still in a track buffer, then database | corruption can occur. | | >USB flash memory sticks seem to be especially pernicious liars | regarding sync requests. One can easily see this by committing a | large transaction to an SQLite database on a USB memory stick. | The COMMIT command will return relatively quickly, indicating | that the memory stick has told the operating system and the | operating system has told SQLite that all content is safely in | persistent storage, and yet the LED on the end of the memory | stick will continue flashing for several more seconds. Pulling | out the memory stick while the LED is still flashing will | frequently result in database corruption. | | ----- | | If you were a human rather than sqlite you would stop believing | that a sync has occurred immediately, trust it a little bit after | a few seconds, and if it's been a minute since you yourself last | touched the file, you would think "surely it has been written by | now". | | So it seems by tracking wall time, the library could do better | than have to rely on the drive, and could be more on guard | against lying drives. | magila wrote: | The bit about disk drive lying about syncing writes is a | persistent myth. There have probably been some HDDs at some | point which had notable bugs in their flush command, but in | general drives do actually write data to disk before reporting | complete status for a flush. | | I spent several years working for a major HDD manufacturer and | their firmware QA process included a test rig which would | repeatedly pull power to the drive while sending write and | flush commands. The drive would then be powered up and the data | on the drive examined to verify that all data which should have | been flushed actually made it to disk. | | I think this myth persists mostly because it can be surprising | difficult to coax an OS into sending a flush command for much | the same reason disk drives get accused of lying about it: | performance. | | These days the most reliably solution is to reconfigure the | drive with write caching disabled. This way the drive will | ensure every write hits the disk before reporting completion to | the host. Write caching remains enabled by default on SATA | drives mostly for historical reasons. Back before command | queuing was commonly supported write caching gave a large | speedup. These days everything supports NCQ so disabling write | caching only has a small performance impact. | xenadu02 wrote: | This is absolutely not a myth. There have been drives that | behaved this way, including from "name brand" manufacturers. | Maybe those were firmware bugs, maybe they stopped doing it, | but it is not something made up. | | Of course there have also been controller bugs, file system | bugs, and so on over the years. A decade ago the behavior of | most file systems was quite poor in the face of underlying | hardware failures but a lot of work has gone into improving | the situation. | | SSDs are slowing improving too. Most of them have claimed to | go into read-only mode when enough blocks wear out but in | reality most of them would crash, fail to respond, or just | lose data. It turns out manufacturers tend to be a bit loose | in their claims until independent third-parties put their | claims to the test. More recent models seem to be a bit | better behaved in failure states. | fencepost wrote: | I'd also wonder about how old this part is and whether it's | partially based on Windows. USB behavior on Windows used to be | optimized for speed and would do caching (hence the need to | eject flash drives before removal) but in one of the Win10 | updates it was changed to safe removal which basically disables | write caching on removable devices. So, at least on Windows, | there was absolutely a time where the OS would report | completion before things were actually written. | kardos wrote: | > So it seems by tracking wall time, the library could do | better than have to rely on the drive, and could be more on | guard against lying drives. | | Can you elaborate about tracking wall time --- how would you | determine the real sync timing? Picking an arbitrary large | delay (a minute) would trash the write performance | logicallee wrote: | Why would it trash the write performance to pick an | arbitrarily large delay, such as one minute? Can you give an | example? I don't know sqlite internals but this document is | about "corrupting" a database (not just losing some data.) | | So I imagine it could be set up so that any power loss that | keeps data from being written within a specified arbitrarily | large delay (such as one minute) won't actually corrupt the | database - just lose data. | tgsovlerkhgsel wrote: | > Backup or restore while a transaction is active | | Is there any information whether taking a consistent snapshot | (that contains both the DB file and journals) is safe? | egdod wrote: | Seems like that would have the same problems if you do it while | transactions are ongoing. | tgsovlerkhgsel wrote: | Given that the DB is designed to survive power outages (even | during transactions), I don't really understand how that | would be the case. | dunham wrote: | I believe they are subtly different. | | If you copy while transactions are ongoing, you're reading | different parts of a file at different points in time. For | example, the beginning of the sqlite3 has already been | copied, but by the time you get to the log file, blocks have | been committed back into the beginning of the sqlite file. | | If you take a consistent snapshot of the filesystem first, I | would think that would be the same as a crash in the middle | of a transaction. Things should be in a consistent state that | the recovery code at startup can handle. | pstuart wrote: | It's recommended to use the backup api vs a filesystem copy: | https://www.sqlite.org/backup.html | bob1029 wrote: | If you are using WAL, I feel volume-level snapshots are an | acceptable approach. | [deleted] | Const-me wrote: | > Backup or restore while a transaction is active | | I wonder does that apply to Windows? | | Well-designed Windows backup software uses shadow copy NTFS | feature. Shadow copy creates a snapshot of the complete volume | when backup starts. All writes happening after that won't be | included in the shadow copy, so the backup software should never | read some old and some new content. | | Does it mean when using VSS on Windows, SQLite databases will be | backed up just fine? Or am I missing something? | DenisM wrote: | You're probably right. Taking a VSS snapshot is identical to | pulling the power plug and imaging the drive. If SQLite hand | handle the latter it should be able to handle the former. | | Another thing that should work well is an AWS EC2 EBS snapshot | as Amazon promises a consistent point in time copy. | cpuguy83 wrote: | In another life (2007) I worked for a company that does digital | signage. We had about 8000 systems deployed across the US. The | software on them relied heavily on SQLite. At some point we | started seeing random corrupted SQLite databases. | | Trying to figure out what was causing this we realized our pc | manufacturer was using the same sku but hardware inside could | vary... and in particular the hard drive. Every system with a | corrupt DB had a particular brand hard drive (WD), the ones that | didn't fail was another (Seagate). | | We never really spent much more time on this other than to lock | down the hardware config and replace all the hard drives in the | failed systems with the "good" brand. | | Super interesting to see this. I'm assuming it was: Disk drives | that do not honor sync requests | | We were of course using consumer grade equipment, because the | company was pretty cheap, but also we had so many to deploy. I | wonder how much this cost the company to fully resolve. Between | systems not playing, having to send technicians, shipping systems | overnight, time troubleshooting, etc. | magila wrote: | Contrary to popular belief, even consumer drives will honor | flush commands. The problem is operating systems generally do | not send flush commands, even when you might expect it (e.g. | last I checked fsync() on Linux does not send a flush). | | Disabling write caching resolves these problems. Unfortunately | SATA drives typically ship with write caching enabled by | default. | zaarn wrote: | Some cheaper HDDs and third-rate drives won't honor flushes. | Same for SSDs, a few very cheap ones ignore flushes in favor | of cache performance. | tobias3 wrote: | Of course fsync() sends a flush to the disk (unless you use | non-standard mount options or "eatmydata"). They may not | honor the FUA bit, though, but hopefully nobody relies on | this if they expect it to work with consumer disks (See e.g. | here https://devblogs.microsoft.com/oldnewthing/20170510-00/? | p=95... ). | | What I have heard is that some disks have problems if they | are on a error recovery path. E.g. they have a corrupt sector | or a sector that needs reallocation (pending sector), and | then they have a power failure while the sector is | reallocated. | networkimprov wrote: | References? | aidos wrote: | Not quite the same thing but it turns out there are some | other gotchas with fsync on Linux | https://lwn.net/Articles/752063/ | wongarsu wrote: | > because the company was pretty cheap, but also we had so many | to deploy | | Isn't "deploy many systems in hard to reach places" the prime | target for enterprise-grade hardware? Fixing problems on one | bespoke system is a lot easier than fixing the same problem | 8000 times, and of course with a large sample size the number | of problems experienced goes up. | michaelt wrote: | I always thought "people willing to pay five to ten times | market rate per gigabyte" was the target of enterprise-grade | storage... | cpuguy83 wrote: | Yep. This was a big problem there. Before they were | delivering content to PC's they distributed content via DVD | once per month. DVD player failure rate was pretty massive, | especially since they just started recycling refurbed | players. | FpUser wrote: | In my other life (2000-2008 ish I think) I've also dealt with | the large (tens of thousands) deployments all consumer grade as | in your case. No SQLite though. Obviously we've had some hard | drive failures. What is very interesting is that the results | were exactly opposite. WD drives lived and Seagate drives | failed. I am wondering if such discrepancy was caused by the | usage patter. | winrid wrote: | The drives weren't actually failing, the setup failed due to | how the WD drives work. If you reformatted them they'd be | fine. | FpUser wrote: | Oops, sorry was not careful reading your message | cpuguy83 wrote: | Yep, exactly this. The drives were fine, only the database | was getting corrupted. | pilsetnieks wrote: | With how widespread both SQLite and WD are wouldn't half the | world have apps crashing left and right, if just the hard drive | brand was the cause of the problem? | wolf550e wrote: | The case of the disks not honoring fsync is famous. I learned | about it from Brad Fitzpatrick, author of LiveJournal (later | on the Go team at Google) writing about it in 2005: | https://brad.livejournal.com/2116715.html | inetknght wrote: | See also: counterfeit SD cards. | nothrabannosir wrote: | I had consistent problems with WD in that same era. Switched | to Seagate as well, and my problems disappeared. Just another | datapoint. | fpgaminer wrote: | GP mentioned digital signage, so probably some displays at a | business. Unlike a computer which would mostly either be left | on (asleep) or shut down properly, I'm sure users had no | qualms about killing power to these displays as they closed | up every night. Write flushing problems would be exacerbated | in that environment. | jmull wrote: | Yes, I think it's possible (likely, even) that there was at | least one more factor. | adrianN wrote: | Apps are in fact crashing left and right. | brewdad wrote: | My Materialistic app crashed about 30 seconds prior to | opening this thread. I probably average about 5 "noisy" app | crashes a week on my phone. No idea how many may be failing | silently. | ShamelessC wrote: | I had to give up on Materialistic because of the | frequency of crashes. It's a shame because the hn website | is really difficult to navigate on mobile (collapsing | threads with that tiny + is a nightmare). Do you know of | any better alternatives that let you log in and post from | the app? | tyingq wrote: | It's somewhat implied it was one specific WD consumer drive | model. | cpuguy83 wrote: | I doubt most consumer applications write to the database | nearly as often as we were. Also very likely a particular | model of the drive and also sheer luck that one worked and | one did not. | kbumsik wrote: | Can they be applied to other DBMSs? I mean, I don't think other | DBMSs treat database files too differently from SQLite, don't | they? | pjc50 wrote: | Most DBMS don't let you at the files directly - you speak to | the DB over a socket. Because SQLite is in-process, bugs in the | process that access random file descriptors can write over the | SQLite files. | | (This is more ammunition for the idea that the real software | isolation boundary on a desktop computer should not be "user", | but "software author"!) | Xophmeister wrote: | SQLite is serverless; server-based DBMSs are free to use other | locking mechanisms, other than relying on filesystem | primitives, to achieve concurrency. | simias wrote: | Most of them would be possible in some variation, although | typically with different results. | | For instance typically databases are accessed through a socket | interface which uses the same pool of file descriptors as | open(). So it's probably possible for a database connection to | end up with fd 2 and then the write(2, ...) would also send | garbage to the database. | | Although in this case it probably won't do much unless you're | very unlucky because the message will almost certainly not end | up being valid for the DB so it'll either ignore it, return an | error or maybe drop the connection without corruption. So it | would still be a nasty thing to debug but not quite as | critical. | | More generally there's typically a much better isolation | between database and process on traditional processes which | makes it difficult for a misbehaving program to mess directly | with the DB. And having a server program processing the | transactions means that you don't have to rely so heavily on | things like filesystem locking to achieve atomicity for | instance. That being said buggy or limited filesystems can be a | problem for any database, for instance to guarantee that a | transaction has been really commited to disk in case of a power | outage etc... See this for instance: | | https://wiki.postgresql.org/wiki/Fsync_Errors | | https://www.postgresql.org/message-id/flat/CAMsr%2BYHh%2B5Oq... | arminiusreturns wrote: | I wonder if dqlite (https://github.com/canonical/dqlite) or | rqlite (https://github.com/rqlite/rqlite) are enough to help | prevent this. | samatman wrote: | No, but they should be able to _recover_ from it, which is | almost as good. | Lex-2008 wrote: | Very interesting read! Although some points might sound too | strange to include in this list (like "memory corruption" - in | case of faulty memory, everything is in danger, not only | database) - some worth keeping in mind (multiple hardlinks to the | file, moving a single DB file around without accompanying *-wal | file, etc) | matsemann wrote: | > strange to include in this list (like "memory corruption" - | in case of faulty memory, everything is in danger, not only | database) | | But theoretically one could build something that handles up to | X random bit flips, so it's maybe worth mentioning that sqlite | doesn't handle this (even though probably no one else does as | well) | Franciscouzo wrote: | Could you? You would also have to take into account that the | instruction that make the program could also be randomly | flipped | blattimwind wrote: | In most cases the size of the code is much smaller than the | data handled, so for random errors (perfectly working DRAM | has a bit error rate somewhere in the general vicinity of | 10^-14/bit*hr) the distribution of errors is in accordance | to that proportion. | mytailorisrich wrote: | They are not describing cases of faulty memory but explicitly, | and very aptly, warn you that since SQLite is a C library that | runs in the same address space as the application code, if your | application code corrupts memory (by way of buffer overrun, | heap corruption, etc) it can impact SQLite's internals and in | turn corrupts the SQLite database file. | andreareina wrote: | Is open() on something other than /dev/stderr returning 2 a | thing? Or is there something screwy that was causing it to get | associated with the open database file? | emilfihlman wrote: | The kernel gives you the lowest numbered fd available. | | If you close stderr, 2 becomes available and can be handed out. | | Also it's not exactly guaranteed that 0, 1 and 2 are the fd | numbers for stdin, out and err on program start, but they are | hard coded and standard. | | E: It's implementation defined, ie not guaranteed. Your | platform may make guarantees about it, but it is not an actual | standard. | Hello71 wrote: | It is not only guaranteed by the POSIX standard, as pointed | out by the siblings, it is necessary for standard shell | scripting idioms to work, e.g. "command1 2>&1 | command2". | matheusmoreira wrote: | http://man7.org/linux/man-pages/man3/stdin.3.html | | > On program startup, the integer file descriptors associated | with the streams stdin, stdout, and stderr are 0, 1, and 2, | respectively. | quindecagon wrote: | > On program startup | | Then, after program startup, if it is closed, then fd 2 | could be assigned to something else. | emilfihlman wrote: | You must differentiate between a guarantee by an actual | standard and platform specific documentation. | matheusmoreira wrote: | https://pubs.opengroup.org/onlinepubs/9699919799/function | s/s... | | > The following symbolic values in _< unistd.h>_ define | the file descriptors that shall be associated with the | C-language _stdin_ , _stdout_ , and _stderr_ when the | application is started: | | > STDIN_FILENO Standard input value, _stdin_. Its value | is 0. | | > STDOUT_FILENO Standard output value, _stdout_. Its | value is 1. | | > STDERR_FILENO Standard error value, _stderr_. Its value | is 2. | [deleted] | blattimwind wrote: | > Also it's not exactly guaranteed that 0, 1 and 2 are the fd | numbers for stdin, out and err on program start | | Yes, it is. | | https://pubs.opengroup.org/onlinepubs/9699919799/functions/s. | .. | emilfihlman wrote: | You must differentiate between a guarantee by an actual | standard and platform specific documentation. | endorphone wrote: | He linked an actual standard. | blattimwind wrote: | Can you tell me what the "std" in "IEEE Std 1003.1-2017" | stands for? | makomk wrote: | That is a guarantee by an actual standard. More | specifically, it's one of the things that POSIX.1-2017 | guarantees but the ISO C standard does not. In practical | terms this means that all Unix-like systems use the | traditional fd numbers for stdin, stdout and stderr but | it's possible that non-Unix systems might not. | trasz wrote: | You're right, but it's more than that: stderr is defined to | be fd 2. Whatever is assigned to that fd, is the stderr. If | you close it and open something else, that something | becomes your stderr. | justinator wrote: | I must be getting old - an article with such a title would used | to also be appended with, "For Fun, and Profit!" | OJFord wrote: | These days that's not enough; most come with 6 Reasons, and | often one of them is particularly Unbelievable. | anon73044 wrote: | I would have expected that if this were posted on milw0rm or a | personal blog, but not on the actual SQLite info page. | justinator wrote: | I honestly feel it's more a sea change type of thing, | | https://mitpress.mit.edu/books/fun-and-profit | dicytea wrote: | Any good introduction to sqlite3 for a Python user? | umvi wrote: | Somewhat unrelated, I feel like the stars aligned to prevent my | company from using SQLite. | | We use a lot of CSV files in our embedded devices. I personally | hate CSV and advocated for SQLite. However, our CSVs are 1 | writer/many readers, and the only way to do that in SQLite is to | change the mode to WAL. However, apparently WAL doesn't work on | JFFS2 filesystems :(... | | I'm going to give it another shot on our next-gen product which | uses ubifs | wolco wrote: | The most common by far. | | I/O error while obtaining a lock leads to corruption. Probably | speaks to my drive/os choices. | tracker1 wrote: | Using a mounted volume in docker for windows/mac desktop and | inside container to edit the same database. | | Don't do this btw, the FS sync in docker for windows/mac is | really slow and prone to causing corruption and errors if you do. | bob1029 wrote: | I have found that corruption of SQLite files in practice is a | very difficult thing to achieve. On Windows, the worst I've ever | seen is a wall of "Database File is Locked" which was resolved | with a rewrite of the DB access logic. After much | experimentation, I have found that the happy path for SQLite is | as follows: | | 1. Use a SQLite distribution that is compiled or otherwise | configured for serialized threading mode (this is the default in | most cases). | | 2. Use a single connection from a single process to access the | SQLite database file. This connection can be safely shared | between multiple threads with optional locking as required (due | to the guarantees offered by #1 above). | | 3. Transactions are handled with locking in the application code | (i.e. on the SQLiteConnection instance), not in the database. | This is predominantly required for inserts with PK generation, | but also applicable in cases where you don't want 2 different | threads to interleave updates during a certain multi-step | operation (e.g. debiting one account and crediting another). | | So far, we have found this to be the most performant and simple | way to get business data in and out of a SQLite database file. | The "one connection per operation" approach is NOT ideal for | SQLite because there is an actual database file that needs a | handle created/closed each time. One connection per makes sense | when you are using a hosted database designed for this access | model. | xienze wrote: | > I have found that corruption of SQLite files in practice is a | very difficult thing to achieve. | | Not in the slightest. Stick your Plex DB on NFS and it'll get | messed up in very short order. | justinclift wrote: | Wonder if it changing some of the NFS mount options (caching, | sync, etc) would make it better? | | And/or options known to cause corruption for this Plex DB | usage pattern. eg: "Don't use options foo, bar, bazz with | PlexDB over NFS, as corruption _will_ occur ". | ta999999171 wrote: | People who know what NFS is will be on Emby or similar before | I finish writing this comment. | catalogia wrote: | I've found WAL mode with one writer thread and as many reader | threads as you want works pretty well. | bob1029 wrote: | Yes, absolutely use WAL mode whenever feasible. Does wonders | for performance. | harikb wrote: | If you have to handle (write code for) 2 and 3, why isn't | something like Postgres better? | cheez wrote: | It's a good question. For me, lack of yet-another-service. | bob1029 wrote: | This is my primary answer as well. I have no problem with | Postgres as a solution, but I will only invite additional | complexity into my life when absolutely required by the | business constraints. | | SQLite is dead simple compared to Postgres (or any other | hosted solution). I can copy a SQLite .db file from a | production machine to my local box and be reviewing | rows/running queries within seconds. Also, sharing SQLite | databases with other users & developers or even using them | as a contract between systems is a very compelling set of | use cases. Finally, going from a blank windows machine to a | working development (or production) environment is very | quick if your application exclusively uses SQLite to store | its data. You simply stop thinking about databases after a | while because everything just works. | jbverschoor wrote: | For me, the single file is the nr 1 reason. It sluist | works, the file is the data.. very clear. | barrkel wrote: | Something like Firebird Embedded would be better than Sqlite | if you have concurrency and performance requirements | alongside an embedded packaging constraint. | | Sqlite is more like a structured persistent file format with | a SQL interface than a database. | mceachen wrote: | I'd love to use Postgres with PhotoStructure, but it would | triple the installation size (!!), make my code more | complicated (I'd basically need to automate common dba work), | and Windows support is shaky (last time I checked, at least). | | I picked SQLite to reduce moving parts. | ebg13 wrote: | I can answer this. Adding a queue thread to your executable | is infinitely easier to redistribute and run than something | that requires the user to install a persistent postgres | daemon. There are premade libraries for several languages | that provide thread safe query queueing for sqlite | transparently as drop in imports so you don't even really | need to think about it. See e.g. | https://github.com/dashawn888/sqlite3worker (not mine) | blattimwind wrote: | In WAL mode you can write from multiple threads without any | problem, but you need to keep serializability in mind. | Specifically, no concurrent write transactions are allowed. In | other words, you can't have a transaction start at X, have | another transaction start at X+1, issue DML and commit, and | expect your original transaction from X being able to commit - | it will never be able to. | | Some SQLite wrappers (notably, Python's) mess with the start | and rollbacks of transactions and make this appear buggy. | platz wrote: | Even in WAL mode, you can still get a SQLITE_BUSY error even | when just trying to open a new connection. The application I | worked on could not tolerate these busy exceptions so we had | to acquire an exclusive global mutex before attempting to | open new connections to sqlite and make sure to release it at | soon as the operation was done. | stephen82 wrote: | They are working on it; please check for 'begin-concurrent' | and 'wal2' branch names here | https://www.sqlite.org/cgi/src/brlist | oefrha wrote: | > Some SQLite wrappers (notably, Python's) mess with the | start and rollbacks of transactions and make this appear | buggy. | | @coleifer (author of peewee) has a pretty informative blog | post exploring -- in part -- transactional semantics, which | might be of value to people who want to know more about this. | | https://charlesleifer.com/blog/going-fast-with-sqlite-and- | py... | mox1 wrote: | +1 for @coleifer, peewee is an awesome ORM, he knows his DB | stuff very well. | kccqzy wrote: | Python's wrapper is one of the worst. It attempts to parse | the SQL you are trying to execute and automatically send | BEGIN or COMMIT to the database. Setting the | "isolation_level" property (badly named) to None makes things | better. | tracker1 wrote: | Posted in another thread... but I discovered how the FS in | Docker for (Windows/Mac) worked the hard way. If you use a | mounted volume in Win/Mac it does a syncing, not a direct | mount/share. This lags and isn't a great option for live | databases and if you're accessing the same SQLite db outside | the container and inside, you will see corruption happen. | | In any case, it was definitely an interesting discovery... | since then, I only use shared volumes with the host as | backup/restore targets, never for shared anything. | zeeZ wrote: | I run Plex on my Windows desktop as a container, with | everything I want to persist mounted in (this includes the | database). If I don't stop the container before hibernating | my PC, I'm pretty much guaranteed to have some kind of | database corruption on the next start. | tracker1 wrote: | I'd consider a volume container for your database, the | content is probably safe enough, since it's not a lot of | heavy write. Then have a timed event that copies the DB out | of the volume container for backups regularly. | tracker1 wrote: | It's also worth noting, that using a DB in a shared volume, | even if you don't touch it outside the container, is _MUCH_ | slower... I was doing this for mssql containers and when I | stopped, my spinup for a fresh instance went from nearly 90 | seconds to about 15. | adrianmonk wrote: | > _Systems that run automatic backups in the background might try | to make a backup copy of an SQLite database file while it is in | the middle of a transaction. The backup copy then might contain | some old and some new content, and thus be corrupt._ | | I am surprised by this! Is it really not entirely safe to take a | backup while some software is running, has a SQLite database | open, and might be writing to it? | | That limitation is fine for a big database that has someone | knowledgeable administering it who will take care to follow any | required backup procedures. But SQLite is often included in | software that nontechnical end users will use without even | knowing that there's a SQLite file. For example, Firefox and | Google Chrome use it. | | This seems to mean that if a regular end user is browsing the web | while backups run on their computer, it's possible they could be | unlucky and get a bad backup that would corrupt the database if | they had to restore it. | | Am I interpreting that right? If so, it suggests that many/most | software projects that include SQLite are doing it in a slightly | unsafe way. | | (It seems possible that the _-journal and_ -wal files could | protect against that, but this SQLite documentation only mentions | that they protect you when transactions fail, which is a | different case.) | btilly wrote: | Let's turn this around. How would you avoid this? | | Suppose that you are copying the file front to back. While you | are doing this, an update comes in that changes something in | the front and back which refer to each other, while your copy | is somewhere in the middle. | | The result is that you get a backup with a data structure in | the back of the file referring to stuff in the front of the | file that isn't there. Your backup copy is therefore corrupted. | | This problem is almost impossible to avoid. You just do the | best that you can. | GlitchMr wrote: | Yes. However, keep in mind it's not just SQLite, but pretty | much every program that has any kind of state. | | For instance, a program may save its state into a file every | time it changes by truncating the file and then writing new | contents. If an user were to copy the file after truncation, | but before write, the user would end up with an useless backup. | | That said, this is rather unlikely to happen in practice. | Programs usually do stuff (including saving files) when they | are interacted with, so a backup copied with File Explorer has | a very high probability of working, even when the application | is running. | | Programs can deal with the issue by performing atomic full | writes by performing renames, but most programs don't bother | to. SQLite cannot do that, because it cannot afford doing full | writes as databases tend to store a lot of data. | Someone wrote: | The better way of 'overwriting' is to write the new file | under a new name, and then atomically move it to where the | old file was. If you have to overwrite (e.g. because of lack | of disk space), the program that rewrites that file should | open it in exclusive mode. | | A back-up tool should open files to backup in a mode that | disallows others from writing to it, and retry that a few | times, each time after waiting a few seconds. | | If that keeps failing, there's a problem. Do you skip backing | up the file or retry With a "it's fine if others are reading" | mode, with the risk of getting a hybrid of multiple versions? | There is no fully correct answer there. | | And it gets worse. Not all programs write files in exclusive | mode. Some might even open a file multiple times while | rewriting it. Also, consistency may work cross-files, for | example with Mac OS packages. | | And then, of course, there's anti-virus tools. They often | work below the file system, reading and writing files without | the OS being aware of it. | | The reason that backup tools work relatively reliable and | speedy at all mostly is that most programs that write to | files will either regularly close them, or just append to | them. | | Databases are the exception. They tend to keep files | permanently open and regularly write to them. If you use | SQLite in a 'normal' application for user data, you should | flush data after (about) every write (if you use it for | caching, you can be a bit more loose, as its relatively fine | to discard the database when it gets corrupted. | justinclift wrote: | > A back-up tool should open files to backup in a mode that | disallows others from writing to it, and retry that a few | times, each time after waiting a few seconds. | | Generally the preferred approach for backup programs is to | trigger a file system snapshot - after quiescing the disks | - and back that up. Sometimes (eg databases), the backup | program is able to run application specific logic that | works in with this (eg flush all database buffers before | the snapshot). | | For the filesystems that don't offer snapshots, some still | have the ability to freeze the file system to allow a | consistent backup. | GlitchMr wrote: | As I said in a comment you responded to. | | > Programs can deal with the issue by performing atomic | full writes by performing renames, but most programs don't | bother to. SQLite cannot do that, because it cannot afford | doing full writes as databases tend to store a lot of data. | | I'm aware you can do that, but most programs aren't. If you | pick any random program, chances are it won't do that. | adrianmonk wrote: | Writing a copy and renaming works, but is it the only way, or | is it just an approach that a lot of software uses because | it's simple and it works? In other words, are there ways that | might be more work to implement but also more efficient that | would be suitable for a database? | | I agree the risk is pretty small in practice. Though copying | with File Explorer isn't the use case I had in mind. It was | automatic scheduled backups. Both Windows and macOS make it | pretty easy to set these up with an external hard drive. And | those could start running while you're using the computer | and/or you could start using the computer while backups are | running. | GlitchMr wrote: | Back-up software can use NTFS/ReFS Shadow Copy in Windows. | This shouldn't cause issues with SQLite. | | Copying files is only an issue for manual back-ups done | without special software. | lolc wrote: | An append-only format should work fine with most backup | software that copies files. A vacuum phase would still have | to do an atomic rename. | gwbas1c wrote: | >> Systems that run automatic backups in the background might | try to make a backup copy of an SQLite database file while it | is in the middle of a transaction. The backup copy then might | contain some old and some new content, and thus be corrupt. | | > I am surprised by this! Is it really not entirely safe to | take a backup while some software is running, has a SQLite | database open, and might be writing to it? | | I've written backup software: You only know if another process | has an open file handle if you write your own driver. (Or | integrate with 3rd party drivers, like ELDOS.) | | In most cases, writing a driver is overkill, so you have to | accept that you might read a file mid-write. In general, the | other process should lock the file, but there's no way to force | every programmer in the world to write their software to be | convenient for you. | lolc wrote: | Yes it is unsafe yet everybody does it. As long as you accept | that some backups may be corrupted, it's fine. You'll still | have older backups to go to. Of course, you have to notice the | corruption in the first place! | | On a snapshotting system such as ZFS it should be fine because | there won't be a race between updates and copying the WAL file | and the DB file. (Haven't investigated it though.) | wongarsu wrote: | That depends on your backup solution. Naively copying or | rsyncing files can lead to corruption for a large number of | applications, including anything using SQLite. With OS or file | system support you can make a snapshot of a point in time and | copy that. Windows backup for example uses VSS (Volume Shadow | Copies [1])) which should not suffer from this problem. On | linux you can make ZFS clones or BTRFS snapshots, backup those | and then delete them locally. | | 1: https://en.wikipedia.org/wiki/Shadow_Copy | jhatemyjob wrote: | > an SQLite | | Does this settle it, then? This implies it's pronounced ess- | queue-lite. I personally prefer the sequel-lite pronunciation, | but if the developers use the former, then I guess I'll do that. | threatripper wrote: | Are you from the 70s and still use SEQUEL? Then you pronounce | it sequel. | | If instead you are a modern hu-man who uses SQL, you pronounce | it ess-qu-ell. | silvestrov wrote: | > Continuing to use a file descriptor after it has been closed | | It would be nice if file descriptors on 64-bit platforms used | tagging so the full value could never be reused. | | E.g. lower 32 bits is the normal fd and the 32-bit is a serial | number (or random value). When using the fd the kernel can use | the lower 32-bit as index into the table and check the upper 32 | bits are correct. | fanf2 wrote: | Sadly that isn't allowed by POSIX | | https://pubs.opengroup.org/onlinepubs/9699919799.2018edition... | | > _2.14. File Descriptor Allocation_ | | > _All functions that open one or more file descriptors shall, | unless specified otherwise, atomically allocate the lowest | numbered available file descriptor_ | codetrotter wrote: | Would badness ensue if we allowed ourselves to define all | previously used fds as "unavailable", and thus conforming to | the rules as stated when we increment our serial number | before handing out the next fd? If yes, what sorts of badness | and why? | samatman wrote: | It strikes me that you'd use up STDIN, STDOUT, and STDERR | pretty fast that way. | | One could make an unprincipled exception for them, of | course. And SQLITE would still need the special-casing to | prevent using low-numbered fds. But this would prevent | other problems and is worth considering. | mras0 wrote: | Didn't see your reply before commenting, but it gave me a | thought: We really only need a "local" solution. If there | were a way of requesting fd's that weren't required to be | the lowest available ones (perhaps even 64-bit ones with | a process unique uppper part), that'd probably be good | enough here. Of course we're still sharing address space | with buggy code, so maybe it's not worth it. | codetrotter wrote: | I'll reply here since then both you and parent can see it | and comment further. | | In the case that you linked to in the sibling comment | that you wrote in reply to me, the dup2 syscall is used. | | So the question then that I have is, in the case of | foo 2&>1 | | Does bash then: | | a) First close fd 2, and then call dup(1), or does it | | b) Call dup2(1, 2), or | | c) Do something else entirely, or something more complex | or convoluted? | | In case the answer is b) then it seems to me, though I | may be overlooking something, that we could allow calls | like b) and inside of the dup2 syscall keep the id of the | fd that we are using as newfd, since it is unambiguously | requested that this is intended, while at the same never | reusing fd id in the dup syscall. | | In this comment I am referring to the dup and dup2 | syscalls as defined in the Linux man pages: | | dup() uses the lowest-numbered unused descriptor for the | new descriptor. | | dup2() makes newfd be the copy of oldfd, closing newfd | first if necessary, but note the following: [...] | | https://linux.die.net/man/2/dup2 | | With the modification that "lowest-numbered unused | descriptor" would be unique as previously stated in the | comment that the two of you are replying to. | | More generally, the question is, does any wide-spread | piece of code manually close an fd and then call the dup | syscall, and expect the fd id to be exactly the same as | the one they themselves closed, instead of using dup2? | | And furthermore, likewise, does any widespread piece of | code use the dup2 syscall to specify the newfd where | actually it should not have cared about the id of the | newfd and should have used dup instead and let the system | decide? | mras0 wrote: | I don't think it matters what bash actually does, the | issue (as I see it) is that when a program calls write(N, | ...) it sometimes _wants_ to refer to the "stale" file | descriptor N. Consider this contrived example: | write(STDERR_FILENO,"test",4); // OK, write to normal | stderr close(STDERR_FILENO); open(....); | // Open some log file // later | write(STDERR_FILENO,"x",1); // OK, write to log file | | Even though fd 2 refers to different files I think the | above is required to work for POSIX compliance. | mras0 wrote: | I think quite a bit of software actually relies on closing | and re-opening file descriptors, especially | stdin/stdout/stderr just before/after forking. This was the | first example I found via a quick web search: | https://unix.stackexchange.com/a/148410 but it suggest that | e.g. bash would break :) | | It might be possible with some hacks/heuristics to catch | many errors (perhaps one could create a valgrind | tool/santizier), but I suspect it's not possible in general | since there's no way of knowing if the call to | write(2,....) meant the old stderr or the new one. | drderidder wrote: | It doesn't get more than a passing mention, but section 2.1 is | worth noting. Don't try to use sqlite files on an NFS mounted | drive. | endorphone wrote: | Related question - does SQLite checksum blocks (pages) of data? | Of course the filesystem usually does, and the physical device | itself does, but in the scenario where the OS somehow delivered | corrupted data would SQLite at a minimum detect it? | Liquid_Fire wrote: | > Of course the filesystem usually does | | Does it really? Certainly a few filesystems like ZFS have | features like that, but AFAIK none of the "mainstream" ones do | (the things actually running on billions of devices, like ext4, | NTFS, etc) | blattimwind wrote: | > does SQLite checksum blocks (pages) of data? | | No. Postgres has an off-by-default option for this. The journal | does, though. | | > Of course the filesystem usually does | | Nope. Data checksumming is usually only employed by CoW file | systems, because it requires data journaling for a regular file | system, which generally isn't used or not supported at all. ___________________________________________________________________ (page generated 2020-01-20 23:00 UTC)