[HN Gopher] Fast Commits for Ext4
       ___________________________________________________________________
        
       Fast Commits for Ext4
        
       Author : lukastyrychtr
       Score  : 77 points
       Date   : 2021-01-15 18:55 UTC (4 hours ago)
        
 (HTM) web link (lwn.net)
 (TXT) w3m dump (lwn.net)
        
       | The_rationalist wrote:
       | I wonder if this interact with https://github.com/clearlinux-
       | pkgs/linux/blob/master/0102-in...
        
       | ape4 wrote:
       | ext4 is getting better just as Fedora has moved to Btrfs
        
       | sroussey wrote:
       | Curious how this change set would affect MySQL or postgresql?
        
         | jldugger wrote:
         | Probably not much -- the main win is not having to fsync files
         | unrelated to your work, which is great for desktops which run
         | multiple unrelated tasks (browser, terminals, rss readers,
         | email clients, steam/game updates, package downloads). But I
         | have to imagine SQL databases are typically run on systems
         | dedicated to that singular task.
        
         | cbhl wrote:
         | Do folks typically turn on journaling at the filesystem layer
         | when running a database?
         | 
         | The database itself contains journaling, so one might choose to
         | run with data=writeback or even directly against the block
         | device if they were concerned about performance.
        
           | comboy wrote:
           | I don't think that those who read the manual do:
           | https://www.postgresql.org/docs/13/wal-intro.html (unless
           | they care about quick crash recovery)
        
           | jabberwcky wrote:
           | You definitely need both, these are two completely different
           | kinds of journalling:
           | 
           | - Filesystem journalling is making robust changes to the data
           | structures describing directories, files, and where files
           | live, in units of atomic filesystem operations. For example,
           | the filesystem journal may record "CREATE FILE", which
           | translates to "update directory entry 1234 in directory block
           | 5678, then allocate and initialize extent descriptor 9999,
           | then write an inode at array entry 74234"
           | 
           | - Database journalling is making robust changes to the data
           | structures describing the actual file contents, in units of
           | atomic logical application operations. For example, a DB
           | journal may record "INSERT ROW", which translates to "update
           | block 123 of this index file, and 234 of this data file",
           | application-specific relationships like that cannot be
           | captured by the filesystem on UNIX.
           | 
           | (Note: NTFS is transactional on Windows. It's entirely
           | possible to correlate independent writes and make them
           | atomic, so on Windows at least, in theory a DB could exist
           | without a separate journal. I don't know if this is used in
           | practice). Even if it were in use, it places severe limits on
           | the kinds of concurrency optimizations a database system
           | could otherwise perform, because all of that stuff moves
           | behind the curtain of the OS interfaces.
        
       | quotemstr wrote:
       | > One of the things that I did discuss with Harshad was using
       | some hueristics, where if there are two "unrelated" applications
       | (e.g., different session id, or process group leader, or
       | different uid, etc. --- details to be determined layer), we would
       | not entangele writes to unrelated files via fsync(2), while
       | forcing files written by the same application to share fate with
       | one another even if only file is fsync'ed.
       | 
       | Ugh. This is why we can't have nice things. I really don't want
       | the kernel's filesystem performance to depend on the number of
       | different UIDs writing to the filesystem. That is insanity!
       | 
       | Ted Ts'o is just wrong here: performance should take priority
       | over preserving the behavior of applications that rely on non-
       | contractual implementation details of the Linux kernel. fsync
       | should sync _only_ the indicated file, and that 's that. We can
       | add a mount option to let users opt into the older, safer
       | behavior, but we shouldn't suffer for essentially an eternity
       | because somewhere, someone might have written an application that
       | depends on an ext4 implementation detail.
        
       ___________________________________________________________________
       (page generated 2021-01-15 23:00 UTC)