[HN Gopher] Bug #915: Please help
       ___________________________________________________________________
        
       Bug #915: Please help
        
       Author : supakeen
       Score  : 276 points
       Date   : 2020-01-12 19:30 UTC (3 hours ago)
        
 (HTM) web link (nedbatchelder.com)
 (TXT) w3m dump (nedbatchelder.com)
        
       | pdq wrote:
       | As mentioned on the Github issue [1], I'd suspect their SQLite
       | optimizations:                   self.execute("pragma
       | journal_mode=off").close()         # This pragma makes writing
       | faster.         self.execute("pragma synchronous=off").close()
       | 
       | I've always used WAL journaling mode, and NORMAL synchronous
       | mode.
       | 
       | [1] https://github.com/nedbat/coveragepy/issues/915
        
         | nedbat wrote:
         | I have tried removing the two pragmas, and the problem still
         | happens.
        
           | jsmith45 wrote:
           | Neither of those make much sense as the cause anyway. Both
           | sacrifice database consistency if the hosting process crashes
           | or os crashes or reboots due to power failure, but with
           | something like a coverage database, neither is particularly
           | worrying. Worst case the users nukes the db, and reruns the
           | tests.
           | 
           | Even if removing those pragmas fixed things, it feels likely
           | that would just be hiding the issue instead of fixing it.
           | 
           | Disk io errors are not supposed to happen in sqlite except in
           | cases like: accessing a corrupted database, drives failing or
           | running out of disk space, bad permissions on some folder
           | sqlite needs to write to.
           | 
           | So unless the test case involves a python process crashing or
           | getting killed, I question if this is actually a bug in your
           | your code.
           | 
           | I feel like this being some bug in either sqlite, or in the
           | file systems used by docker (specifically by behaving in some
           | manner not anticipated by sqlite).
           | 
           | With those optimizations, the only remaining sqlite disk IO
           | should be the database itself, and the temp store. Since I
           | stronly suspect the error you are getting is coming from
           | sqlite, it would be helpful to know which of those two disk
           | io cases it is.
           | 
           | Perhaps you can try the pragma to force the temp store to be
           | in memory? The results of that might help us know if the disk
           | io error is in reading/writing the database file itself, or
           | if it is in reading/writing temp files.
           | 
           | Knowing that could narrow things down a bit.
        
           | slovenlyrobot wrote:
           | For some reason Docker is struggling to download the image
           | right now (lots of interested parties? :), I just wanted to
           | trap writes to the database file and look at what's actually
           | getting written there.
           | 
           | Is there any possibility e.g. of an FD reuse issue in the
           | code somewhere? It might even be possible to spot it with
           | strace (although you need to start the container with the
           | right caps to try that). An example where this can happen is
           | programs closing 'stderr' fd 2, only for e.g. SQLite to open
           | its database reusing fd 2, then some library code calls
           | fprintf(stderr, ..) etc.
           | 
           | Would love a copy of the DB 'sometime before' and after to
           | look at. There are tools around for inspecting SQLite
           | databases on a page-by-page basis
        
       | mmastrac wrote:
       | There was a similar bug in unraid that they ended up fixing in
       | 6.8.0 where it turned out that SQLite wasn't handling some sort
       | of error condition in read-ahead I/O? I wonder if this is
       | related.
       | 
       | https://forums.unraid.net/bug-reports/prereleases/sqlite-dat...
       | 
       | From that report:
       | 
       | ====== 8< ======
       | 
       | > In the Linux block layer each READ or WRITE can have various
       | modifier bits set. In the case of a read-ahead you get
       | READ|REQ_RAHEAD which tells I/O driver this is a read-ahead. In
       | this case, if there are insufficient resources at the time this
       | request is received, the driver is permitted to terminate the
       | operation with BLK_STS_IOERR status. Here is an example in Linux
       | md/raid5 driver.
       | 
       | > In case of Unraid it can definitely happen under heavy load
       | that a read-ahead comes along and there are no 'stripe buffers'
       | immediately available. In this case, instead of making calling
       | process wait, it terminated the I/O. This has worked this way for
       | years.
       | 
       | [...]
       | 
       | > What I suspect is that this is a bug in SQLite - I think SQLite
       | is using direct-I/O (bypassing page cache) and issuing it's own
       | read-aheads and their logic to handle failing read-ahead is
       | broken. But I did not follow that rabbit hole - too many other
       | problems to work on :/
       | 
       | ====== 8< ======
       | 
       | Some related bugs they brought up:
       | 
       | https://bugzilla.kernel.org/show_bug.cgi?id=201685
       | 
       | https://patchwork.kernel.org/patch/10712695/
       | 
       | Edit: it appears that there's _something_ causing corruption when
       | drivers fail read-ahead I/O. Whether it's SQLite or something
       | else in Linux is another question.
        
         | blattimwind wrote:
         | SQLite doesn't use direct I/O.
        
         | geofft wrote:
         | > _I think SQLite is using direct-I /O (bypassing page cache)_
         | 
         | Should be pretty easy to confirm/deny that suspicion and see if
         | it's worth going down the rabbit hole, no?
         | strace -e open -o >(grep O_DIRECT) ./myprogram
        
           | lelf wrote:
           | -o >(grep O_DIRECT)
           | 
           | This is cool. Never seen it (but it makes sense.)
        
             | mmastrac wrote:
             | Its cousin                 diff <(cmd a) <(cmd b)
             | 
             | is also very handy.
        
               | lostmsu wrote:
               | Not a command line guru, but this looks interesting. Does
               | it compare outputs of two commands? How does it do it?
        
               | arendtio wrote:
               | <(command) is being replaced by a path to a file which
               | contains the output of the command:                 $ ls
               | <(echo A)       /dev/fd/63            $ cat <(echo A)
               | A
               | 
               | So                 diff <(command1) <(command2)
               | 
               | becomes                 diff /dev/fd/62  /dev/fd/63
        
               | smartmic wrote:
               | It is called process substitution and specific to Bash
               | (not POSIX shell compliant). The output of the commands
               | in parenthesis will become runtime (file) inputs for the
               | diff command.
               | 
               | More: https://www.gnu.org/software/bash/manual/html_node/
               | Process-S...
        
               | hans1729 wrote:
               | diff compares files line by line, here we just redirect
               | the output of cmd1 and cmd2 via <, so they serve as
               | 'proxies' for the file-args diff expects
               | 
               | you might want to read into redirection on the
               | commandline, it's amazingly powerful
        
               | mmastrac wrote:
               | It might be easiest to view it like so:                 $
               | echo <(echo 1) <(echo 2)       /dev/fd/63 /dev/fd/62
               | 
               | The shell spawns two commands connected to pipes, then
               | replaces those with a file that represents the other
               | (read) side of that pipe.
               | 
               | The command above with >(grep something) does the same
               | thing, just with the other end of a pipe.
        
               | deaddodo wrote:
               | It's not actually a pipe. >, < and | are IO redirection
               | commands; but of those only | is a pipe. Pipes
               | specifically "pipe" the STDOUT of one command as the
               | STDIN of another. The other two instead point a command
               | to the file descriptor of the output of another.
               | 
               | This is explained in more detail here:
               | 
               | https://askubuntu.com/questions/172982/what-is-the-
               | differenc...
        
               | mmastrac wrote:
               | On linux at least, these two constructions are
               | effectively identical from the view of the "piped"
               | process.                 # cat <(ls -l /proc/self/fd/)
               | total 0       lrwx------ 1 root root 64 Jan 12 15:02 0 ->
               | /dev/pts/0       l-wx------ 1 root root 64 Jan 12 15:02 1
               | -> pipe:[20519634]       lrwx------ 1 root root 64 Jan 12
               | 15:02 2 -> /dev/pts/0       lr-x------ 1 root root 64 Jan
               | 12 15:02 3 -> /proc/23611/fd/            # ls -l
               | /proc/self/fd/ | cat -       total 0       lrwx------ 1
               | root root 64 Jan 12 15:02 0 -> /dev/pts/0
               | l-wx------ 1 root root 64 Jan 12 15:02 1 ->
               | pipe:[20518265]       lrwx------ 1 root root 64 Jan 12
               | 15:02 2 -> /dev/pts/0       lr-x------ 1 root root 64 Jan
               | 12 15:02 3 -> /proc/23621/fd/
        
               | deaddodo wrote:
               | Yes, because "cat" can use either STDIN or a file as
               | input:
               | 
               | https://github.com/coreutils/coreutils/blob/master/src/ca
               | t.c...
               | 
               | It's an intentional design decision.
        
               | mmastrac wrote:
               | I think we're probably in agreement, but we're getting
               | caught up on the definition of "piping".
               | 
               | In my case I'm using it to describe the use of a kernel
               | pipe object. In your case, you are using it to describe
               | the higher-level concept of connection from one processes
               | stdout to another process's stdin (or something along
               | those lines).
        
               | TheDong wrote:
               | It is a pipe, just for a different definition of pipe.
               | 
               | | is the pipe operator in sh for doing a shell
               | "pipeline". However, the parent comment is referring to a
               | linux pipe as in pipe(7) [0].
               | 
               | One easy way to see this is with the following:
               | $ ls -l <(echo foo)         lr-x------ 1 user group Jan
               | 12 01:23 /proc/self/fd/16 -> 'pipe:[2937585]'
               | 
               | As you can see, that command created a fd (16) which
               | referred to a pipe (pipe:[2937585]).
               | 
               | Those file descriptors were created using the pipe(2)[1]
               | call by the shell, so it seems fine to refer to them to
               | pipes.
               | 
               | I'll also note that <() / >() are _not_ using the
               | "redirection operator". They're actually distinct
               | operators for "process substitution"[2] in bash
               | terminology. They look similar to redirects, but they're
               | not the same operator, so that stack overflow answer
               | isn't really relevant.
               | 
               | [0]: http://man7.org/linux/man-pages/man7/pipe.7.html
               | 
               | [1]: http://man7.org/linux/man-pages/man2/pipe.2.html
               | 
               | [2]: https://tldp.org/LDP/abs/html/process-sub.html
        
               | sillysaurusx wrote:
               | The <(foo | bar) syntax evaluates to a file path, e.g.
               | $ echo <(echo hi)       /dev/fd/63
               | 
               | If a program reads from that file path, it gets the
               | output of the command chain.                 $ cat <(
               | (echo foo; echo bar) | grep bar )       bar
               | 
               | Of course, that's a silly example. But that explains how
               | the diff example works, since it's the same idea: diff
               | just reads from the two "files".
               | 
               | EDIT: Heh, 5 different replies within the same 60 second
               | window.
        
               | deaddodo wrote:
               | Yes. You're directing the STDOUT of each command as a
               | FILE into diff. Diff takes two files, thus the
               | duplication.
        
         | pdw wrote:
         | I don't think sqlite uses direct IO -- at least there's no
         | mention of O_DIRECT in the source code
        
       | TAForObvReasons wrote:
       | OT: After the lamentation of the "Github Monoculture", why was
       | development of coveragepy moved from bitbucket to github?
       | 
       | https://bitbucket.org/ned/coveragepy/src/default/ original
       | project
       | https://nedbatchelder.com/blog/201405/github_monoculture.htm...
       | original post https://news.ycombinator.com/item?id=7690897 HN
       | comments
        
         | nedbat wrote:
         | If the monoculture were easy to resist, it wouldn't need
         | lamentation. :)
        
         | jakeogh wrote:
         | My blocker with bitbucket is it requires JS to use the web
         | interface. Otherwise, blank page.
        
       | contingencies wrote:
       | Had an I/O issue in SQLite yesterday: the cause was a full disk,
       | due to years of SQLite backups!
        
         | DominoTree wrote:
         | The solution is clearly to restore a full system backup from a
         | time that it had fewer SQLite backups :P
        
       | x1798DE wrote:
       | This is an editorialized title and as far as I know it's not even
       | accurate. The author maintains, but did not create coverage.py.
       | 
       | (Edit: Original title has been changed, so this comment is no
       | longer necessary.)
        
         | supakeen wrote:
         | I was unaware of that and it seems people have fixed the title
         | for me. Thanks!
        
         | jp_sc wrote:
         | Looking at the commit history, he is the author of almost all
         | the code written in the last ten years. Even if there was
         | someone before him (who was it? I couldn't find any reference
         | to them), how could we not consider him the author of the
         | project by now?
        
           | mjw1007 wrote:
           | It was Gareth Rees.
           | 
           | https://garethrees.org/2001/12/04/python-coverage/
        
           | x1798DE wrote:
           | The original title made reference to Ned as the "creator of
           | coveragepy", which specifically means the person who created
           | it. The original author was Gareth Rees, according to
           | Contributors.txt [1], and Ned Batchelder took over in 2004.
           | 
           | I don't mean it to suggest that Ned has a _lessor_ role than
           | creator, just that the post 's title was changed to add
           | _inaccurate_ information. I figured that if the moderators
           | saw this and decided that the context of the author 's role
           | in Coverage was important, they could update the title to
           | reflect his actual role.
           | 
           | [1] https://github.com/nedbat/coveragepy/blob/master/CONTRIBU
           | TOR...
        
             | jp_sc wrote:
             | Sorry, I missed the obvious place to look into it.
        
       | dmead wrote:
       | hey, are you using docker on mac?
        
         | nedbat wrote:
         | In my reproduction, yes. But the original failure was on
         | Travis, which I doubt is using docker on mac.
        
       ___________________________________________________________________
       (page generated 2020-01-12 23:00 UTC)