[HN Gopher] Bug #915: Please help ___________________________________________________________________ Bug #915: Please help Author : supakeen Score : 276 points Date : 2020-01-12 19:30 UTC (3 hours ago) (HTM) web link (nedbatchelder.com) (TXT) w3m dump (nedbatchelder.com) | pdq wrote: | As mentioned on the Github issue [1], I'd suspect their SQLite | optimizations: self.execute("pragma | journal_mode=off").close() # This pragma makes writing | faster. self.execute("pragma synchronous=off").close() | | I've always used WAL journaling mode, and NORMAL synchronous | mode. | | [1] https://github.com/nedbat/coveragepy/issues/915 | nedbat wrote: | I have tried removing the two pragmas, and the problem still | happens. | jsmith45 wrote: | Neither of those make much sense as the cause anyway. Both | sacrifice database consistency if the hosting process crashes | or os crashes or reboots due to power failure, but with | something like a coverage database, neither is particularly | worrying. Worst case the users nukes the db, and reruns the | tests. | | Even if removing those pragmas fixed things, it feels likely | that would just be hiding the issue instead of fixing it. | | Disk io errors are not supposed to happen in sqlite except in | cases like: accessing a corrupted database, drives failing or | running out of disk space, bad permissions on some folder | sqlite needs to write to. | | So unless the test case involves a python process crashing or | getting killed, I question if this is actually a bug in your | your code. | | I feel like this being some bug in either sqlite, or in the | file systems used by docker (specifically by behaving in some | manner not anticipated by sqlite). | | With those optimizations, the only remaining sqlite disk IO | should be the database itself, and the temp store. Since I | stronly suspect the error you are getting is coming from | sqlite, it would be helpful to know which of those two disk | io cases it is. | | Perhaps you can try the pragma to force the temp store to be | in memory? The results of that might help us know if the disk | io error is in reading/writing the database file itself, or | if it is in reading/writing temp files. | | Knowing that could narrow things down a bit. | slovenlyrobot wrote: | For some reason Docker is struggling to download the image | right now (lots of interested parties? :), I just wanted to | trap writes to the database file and look at what's actually | getting written there. | | Is there any possibility e.g. of an FD reuse issue in the | code somewhere? It might even be possible to spot it with | strace (although you need to start the container with the | right caps to try that). An example where this can happen is | programs closing 'stderr' fd 2, only for e.g. SQLite to open | its database reusing fd 2, then some library code calls | fprintf(stderr, ..) etc. | | Would love a copy of the DB 'sometime before' and after to | look at. There are tools around for inspecting SQLite | databases on a page-by-page basis | mmastrac wrote: | There was a similar bug in unraid that they ended up fixing in | 6.8.0 where it turned out that SQLite wasn't handling some sort | of error condition in read-ahead I/O? I wonder if this is | related. | | https://forums.unraid.net/bug-reports/prereleases/sqlite-dat... | | From that report: | | ====== 8< ====== | | > In the Linux block layer each READ or WRITE can have various | modifier bits set. In the case of a read-ahead you get | READ|REQ_RAHEAD which tells I/O driver this is a read-ahead. In | this case, if there are insufficient resources at the time this | request is received, the driver is permitted to terminate the | operation with BLK_STS_IOERR status. Here is an example in Linux | md/raid5 driver. | | > In case of Unraid it can definitely happen under heavy load | that a read-ahead comes along and there are no 'stripe buffers' | immediately available. In this case, instead of making calling | process wait, it terminated the I/O. This has worked this way for | years. | | [...] | | > What I suspect is that this is a bug in SQLite - I think SQLite | is using direct-I/O (bypassing page cache) and issuing it's own | read-aheads and their logic to handle failing read-ahead is | broken. But I did not follow that rabbit hole - too many other | problems to work on :/ | | ====== 8< ====== | | Some related bugs they brought up: | | https://bugzilla.kernel.org/show_bug.cgi?id=201685 | | https://patchwork.kernel.org/patch/10712695/ | | Edit: it appears that there's _something_ causing corruption when | drivers fail read-ahead I/O. Whether it's SQLite or something | else in Linux is another question. | blattimwind wrote: | SQLite doesn't use direct I/O. | geofft wrote: | > _I think SQLite is using direct-I /O (bypassing page cache)_ | | Should be pretty easy to confirm/deny that suspicion and see if | it's worth going down the rabbit hole, no? | strace -e open -o >(grep O_DIRECT) ./myprogram | lelf wrote: | -o >(grep O_DIRECT) | | This is cool. Never seen it (but it makes sense.) | mmastrac wrote: | Its cousin diff <(cmd a) <(cmd b) | | is also very handy. | lostmsu wrote: | Not a command line guru, but this looks interesting. Does | it compare outputs of two commands? How does it do it? | arendtio wrote: | <(command) is being replaced by a path to a file which | contains the output of the command: $ ls | <(echo A) /dev/fd/63 $ cat <(echo A) | A | | So diff <(command1) <(command2) | | becomes diff /dev/fd/62 /dev/fd/63 | smartmic wrote: | It is called process substitution and specific to Bash | (not POSIX shell compliant). The output of the commands | in parenthesis will become runtime (file) inputs for the | diff command. | | More: https://www.gnu.org/software/bash/manual/html_node/ | Process-S... | hans1729 wrote: | diff compares files line by line, here we just redirect | the output of cmd1 and cmd2 via <, so they serve as | 'proxies' for the file-args diff expects | | you might want to read into redirection on the | commandline, it's amazingly powerful | mmastrac wrote: | It might be easiest to view it like so: $ | echo <(echo 1) <(echo 2) /dev/fd/63 /dev/fd/62 | | The shell spawns two commands connected to pipes, then | replaces those with a file that represents the other | (read) side of that pipe. | | The command above with >(grep something) does the same | thing, just with the other end of a pipe. | deaddodo wrote: | It's not actually a pipe. >, < and | are IO redirection | commands; but of those only | is a pipe. Pipes | specifically "pipe" the STDOUT of one command as the | STDIN of another. The other two instead point a command | to the file descriptor of the output of another. | | This is explained in more detail here: | | https://askubuntu.com/questions/172982/what-is-the- | differenc... | mmastrac wrote: | On linux at least, these two constructions are | effectively identical from the view of the "piped" | process. # cat <(ls -l /proc/self/fd/) | total 0 lrwx------ 1 root root 64 Jan 12 15:02 0 -> | /dev/pts/0 l-wx------ 1 root root 64 Jan 12 15:02 1 | -> pipe:[20519634] lrwx------ 1 root root 64 Jan 12 | 15:02 2 -> /dev/pts/0 lr-x------ 1 root root 64 Jan | 12 15:02 3 -> /proc/23611/fd/ # ls -l | /proc/self/fd/ | cat - total 0 lrwx------ 1 | root root 64 Jan 12 15:02 0 -> /dev/pts/0 | l-wx------ 1 root root 64 Jan 12 15:02 1 -> | pipe:[20518265] lrwx------ 1 root root 64 Jan 12 | 15:02 2 -> /dev/pts/0 lr-x------ 1 root root 64 Jan | 12 15:02 3 -> /proc/23621/fd/ | deaddodo wrote: | Yes, because "cat" can use either STDIN or a file as | input: | | https://github.com/coreutils/coreutils/blob/master/src/ca | t.c... | | It's an intentional design decision. | mmastrac wrote: | I think we're probably in agreement, but we're getting | caught up on the definition of "piping". | | In my case I'm using it to describe the use of a kernel | pipe object. In your case, you are using it to describe | the higher-level concept of connection from one processes | stdout to another process's stdin (or something along | those lines). | TheDong wrote: | It is a pipe, just for a different definition of pipe. | | | is the pipe operator in sh for doing a shell | "pipeline". However, the parent comment is referring to a | linux pipe as in pipe(7) [0]. | | One easy way to see this is with the following: | $ ls -l <(echo foo) lr-x------ 1 user group Jan | 12 01:23 /proc/self/fd/16 -> 'pipe:[2937585]' | | As you can see, that command created a fd (16) which | referred to a pipe (pipe:[2937585]). | | Those file descriptors were created using the pipe(2)[1] | call by the shell, so it seems fine to refer to them to | pipes. | | I'll also note that <() / >() are _not_ using the | "redirection operator". They're actually distinct | operators for "process substitution"[2] in bash | terminology. They look similar to redirects, but they're | not the same operator, so that stack overflow answer | isn't really relevant. | | [0]: http://man7.org/linux/man-pages/man7/pipe.7.html | | [1]: http://man7.org/linux/man-pages/man2/pipe.2.html | | [2]: https://tldp.org/LDP/abs/html/process-sub.html | sillysaurusx wrote: | The <(foo | bar) syntax evaluates to a file path, e.g. | $ echo <(echo hi) /dev/fd/63 | | If a program reads from that file path, it gets the | output of the command chain. $ cat <( | (echo foo; echo bar) | grep bar ) bar | | Of course, that's a silly example. But that explains how | the diff example works, since it's the same idea: diff | just reads from the two "files". | | EDIT: Heh, 5 different replies within the same 60 second | window. | deaddodo wrote: | Yes. You're directing the STDOUT of each command as a | FILE into diff. Diff takes two files, thus the | duplication. | pdw wrote: | I don't think sqlite uses direct IO -- at least there's no | mention of O_DIRECT in the source code | TAForObvReasons wrote: | OT: After the lamentation of the "Github Monoculture", why was | development of coveragepy moved from bitbucket to github? | | https://bitbucket.org/ned/coveragepy/src/default/ original | project | https://nedbatchelder.com/blog/201405/github_monoculture.htm... | original post https://news.ycombinator.com/item?id=7690897 HN | comments | nedbat wrote: | If the monoculture were easy to resist, it wouldn't need | lamentation. :) | jakeogh wrote: | My blocker with bitbucket is it requires JS to use the web | interface. Otherwise, blank page. | contingencies wrote: | Had an I/O issue in SQLite yesterday: the cause was a full disk, | due to years of SQLite backups! | DominoTree wrote: | The solution is clearly to restore a full system backup from a | time that it had fewer SQLite backups :P | x1798DE wrote: | This is an editorialized title and as far as I know it's not even | accurate. The author maintains, but did not create coverage.py. | | (Edit: Original title has been changed, so this comment is no | longer necessary.) | supakeen wrote: | I was unaware of that and it seems people have fixed the title | for me. Thanks! | jp_sc wrote: | Looking at the commit history, he is the author of almost all | the code written in the last ten years. Even if there was | someone before him (who was it? I couldn't find any reference | to them), how could we not consider him the author of the | project by now? | mjw1007 wrote: | It was Gareth Rees. | | https://garethrees.org/2001/12/04/python-coverage/ | x1798DE wrote: | The original title made reference to Ned as the "creator of | coveragepy", which specifically means the person who created | it. The original author was Gareth Rees, according to | Contributors.txt [1], and Ned Batchelder took over in 2004. | | I don't mean it to suggest that Ned has a _lessor_ role than | creator, just that the post 's title was changed to add | _inaccurate_ information. I figured that if the moderators | saw this and decided that the context of the author 's role | in Coverage was important, they could update the title to | reflect his actual role. | | [1] https://github.com/nedbat/coveragepy/blob/master/CONTRIBU | TOR... | jp_sc wrote: | Sorry, I missed the obvious place to look into it. | dmead wrote: | hey, are you using docker on mac? | nedbat wrote: | In my reproduction, yes. But the original failure was on | Travis, which I doubt is using docker on mac. ___________________________________________________________________ (page generated 2020-01-12 23:00 UTC)