[HN Gopher] Fast Commits for Ext4 ___________________________________________________________________ Fast Commits for Ext4 Author : lukastyrychtr Score : 77 points Date : 2021-01-15 18:55 UTC (4 hours ago) (HTM) web link (lwn.net) (TXT) w3m dump (lwn.net) | The_rationalist wrote: | I wonder if this interact with https://github.com/clearlinux- | pkgs/linux/blob/master/0102-in... | ape4 wrote: | ext4 is getting better just as Fedora has moved to Btrfs | sroussey wrote: | Curious how this change set would affect MySQL or postgresql? | jldugger wrote: | Probably not much -- the main win is not having to fsync files | unrelated to your work, which is great for desktops which run | multiple unrelated tasks (browser, terminals, rss readers, | email clients, steam/game updates, package downloads). But I | have to imagine SQL databases are typically run on systems | dedicated to that singular task. | cbhl wrote: | Do folks typically turn on journaling at the filesystem layer | when running a database? | | The database itself contains journaling, so one might choose to | run with data=writeback or even directly against the block | device if they were concerned about performance. | comboy wrote: | I don't think that those who read the manual do: | https://www.postgresql.org/docs/13/wal-intro.html (unless | they care about quick crash recovery) | jabberwcky wrote: | You definitely need both, these are two completely different | kinds of journalling: | | - Filesystem journalling is making robust changes to the data | structures describing directories, files, and where files | live, in units of atomic filesystem operations. For example, | the filesystem journal may record "CREATE FILE", which | translates to "update directory entry 1234 in directory block | 5678, then allocate and initialize extent descriptor 9999, | then write an inode at array entry 74234" | | - Database journalling is making robust changes to the data | structures describing the actual file contents, in units of | atomic logical application operations. For example, a DB | journal may record "INSERT ROW", which translates to "update | block 123 of this index file, and 234 of this data file", | application-specific relationships like that cannot be | captured by the filesystem on UNIX. | | (Note: NTFS is transactional on Windows. It's entirely | possible to correlate independent writes and make them | atomic, so on Windows at least, in theory a DB could exist | without a separate journal. I don't know if this is used in | practice). Even if it were in use, it places severe limits on | the kinds of concurrency optimizations a database system | could otherwise perform, because all of that stuff moves | behind the curtain of the OS interfaces. | quotemstr wrote: | > One of the things that I did discuss with Harshad was using | some hueristics, where if there are two "unrelated" applications | (e.g., different session id, or process group leader, or | different uid, etc. --- details to be determined layer), we would | not entangele writes to unrelated files via fsync(2), while | forcing files written by the same application to share fate with | one another even if only file is fsync'ed. | | Ugh. This is why we can't have nice things. I really don't want | the kernel's filesystem performance to depend on the number of | different UIDs writing to the filesystem. That is insanity! | | Ted Ts'o is just wrong here: performance should take priority | over preserving the behavior of applications that rely on non- | contractual implementation details of the Linux kernel. fsync | should sync _only_ the indicated file, and that 's that. We can | add a mount option to let users opt into the older, safer | behavior, but we shouldn't suffer for essentially an eternity | because somewhere, someone might have written an application that | depends on an ext4 implementation detail. ___________________________________________________________________ (page generated 2021-01-15 23:00 UTC)