[HN Gopher] Linux NILFS file system: automatic continuous snapshots ___________________________________________________________________ Linux NILFS file system: automatic continuous snapshots Author : solene Score : 191 points Date : 2022-10-11 11:58 UTC (11 hours ago) (HTM) web link (dataswamp.org) (TXT) w3m dump (dataswamp.org) | wazoox wrote: | I've been running NILFS2 on my main work NAS for 8 years. It | never failed us :) | mdaniel wrote: | I mean this honestly: how did you evaluate such a new | filesystem in order to bet a work NAS upon it? | wazoox wrote: | I've made some testing, and installed it on a secondary | system that in the beginning mostly hosted unimportant files. | Then we added more things, and as after a few years it posed | absolutely no problem we went further (and added a backup | procedure). Then we migrated to new hardware, and it's still | going strong (it's quite small, about 15 TB volume). | yonrg wrote: | I would do it by using it! ... and probably some backup | remram wrote: | How is this pronounced? Nil-F-S? Nilfuss? Nai-L-F-S? N-I-L-F-S? | heavyset_go wrote: | The first one. | Rygian wrote: | How close is this to a large continuous tape loop for video | surveillance? | | I would very much welcome a filesystem that breaks away from the | directories/files paradigm. Any time-based data store would | greatly benefit from that. | rcthompson wrote: | I think all you would need to add is a daemon that | automatically deletes the oldest file(s) whenever free space | drops below a certain threshold, so that the filesystem GC can | reclaim that space for new files. | tommiegannert wrote: | If NILFS is continuously checkpointing, couldn't you even | remove the file right after you add it, for simplicity? | Rygian wrote: | I know and use 'logrotate'. | | My point was more on the tracks of a filesystem where a | single file can be overwritten over and over again, and it's | up to the filesystem to transparently ensure the full | capacity of the disk is put towards retaining old versions of | the file. | nix23 wrote: | Hmm maybe something like Bluestore? | | https://docs.ceph.com/en/latest/rados/configuration/storage | -... | Rygian wrote: | I definitely need to dive into Ceph, thanks for the | pointer :-) | darau1 wrote: | What's the difference between a snapshot, and a checkpoint? | okasaki wrote: | from TA: | | > A checkpoint is a snapshot of your system at a given point in | time, but it can be deleted automatically if some disk space | must be reclaimed. A checkpoint can be transformed into a | snapshot that will never be removed. | sargun wrote: | I've always wondered why NILFS (or similar) isn't used for cases | where ransomware is a risk. I'm honestly surprised that it's not | mandated to use a append-only / log-structured filesystem for | some critical systems (think patient records), where the cost of | losing data is so high, rarely mutated, and trading it off for | wasting storage isn't that bad (after all, HDD storage is | incredibly cheap, and nobody said you had to keep the working set | and the log on the same device). | compsciphd wrote: | you don't need a log structured fs to do this, you could just | have regular zfs/btrfs snapshots too. | | BUT | | if an attack has the ability to delete an entire file system / | encrypt it, they really have the ability to delete the | snapshots as well, the only reason they might not is due to | "security through obscurity". | | now, what I have argued is that an append only file system | which works in a SAN like environment (i.e. you have random | reads, but only append writes properties that are enforced | remotely) could give you that, but to an extent you'd still get | a similar behavior by just exporting ZFS shares (or even as | block devices) and snapshotting them regularly on the remote | end. | ephbit wrote: | > if an attack has the ability to delete an entire file | system / encrypt it, they really have the ability to delete | the snapshots as well, .. | | How so? | | Let's say you have one machine holding the actual data for | working on it. And some backup server. You could use btrfs | send over ssh and regularly btrfs receive the data on the | backup machine. Even it they got encrypted by ransomware they | wouldn't be lost in the backups. As long as they're not | deleted there how could a compromised work machine compromise | the data on the backup machine? | ggm wrote: | Didn't VMS have this baked in? My memory is that all 8.3 file | names had 8.3[;nnn] version tagging under the hood | usr1106 wrote: | That's what it looked like, but I doubt it was deep in the | filesystem. It was basically just a naming convention. User had | to purge old versions manually. This gets tedious if you have | many files that change often. Snapshots are a safety net, not | something you want to have in your way all day long. | ggm wrote: | Er.. my memory is that it did COW inside VMS fs semantics and | was not manually achived. You did have to manually delete. So | I don't think it was just a hack. | | It didn't do directories so was certainly not as good as | snapshot but we're talking 40 years ago! | jerf wrote: | What happens if you run "dd if=/dev/zero of=/any/file/here", thus | simply loading the disk with all the zeros it can handle? Do you | lose all your snapshots as they are deleted to make room, or does | it keep some space aside for this situation? | | (Not a "gotcha" question, a legitimate question.) | regularfry wrote: | I know this isn't what you're getting at, but is it smart | enough to create a sparse file when you specifically pick zero | as your filler byte? | solene wrote: | the garbage collector daemon will delete older checkpoints | beyond the preserve time to make some room. | Volundr wrote: | It's configurable: https://nilfs.sourceforge.io/en/man5/nilfs_c | leanerd.conf.5.h.... Cleanerd is responsible for maintaining a | certain amount of free space on the system, and you can control | the rules for doing so (e.x. a checkpoint won't be eligible for | being cleaned until it is 1 week old). | | It's also worth knowing NILFS2 has checkpoints and snapshots. | What you actually get are continuous "checkpoints". These can | be upgraded to snapshots at any time with a simple command. | Checkpoints are garbage collected, snapshots are not (until | they are downgraded back into checkpoints). | throwaway787544 wrote: | didgetmaster wrote: | Does NILFS do checksums and snapshotting for every single file in | the system? One of my biggest complaints about file systems in | general is that they are all designed to treat every file the | exact same way. | | We now have storage systems (even SSDs) that are big enough to | hold hundreds of millions of files. Those files can be a mix of | small files, big files, temp files, personal files, and public | files. Yet every file system must treat your precious thesis | paper the same way it treats a huge cat video you downloaded off | the Internet. | | We need some kind of 'object store' where each object can be | given a set of attributes that govern how the file system treats | it. Backup, encryption, COW, checksums, and other operations | should not be wasted on a bunch of data that no one really cares | about. | | I have been working on a kind of object file system that | addresses this problem. | nix23 wrote: | Well you can do that kind of with zfs filesystems, and the | "object" is the recordsize. | mustache_kimono wrote: | I was going to ask: "Is there any limit on the number of ZFS | filesystems in a pool?" Google says 2^64 is the limit. | | Couldn't one just just generate a filesystem per object if | snapshots, etc., on a per object level is what one cared | about? Wonder how quickly this would fall over? | | > Backup, encryption, COW, checksums, and other operations | should not be wasted on a bunch of data that no one really | cares about. | | This GP comment is a little goofy though. There was a user I | once encountered who wanted ZFS, but a la carte. "I want the | snapshots but I don't need COW." You have to explain, "You | don't get the snapshots unless you have the COW", etc. | Conan_Kudo wrote: | On Btrfs, you can mark a folder/file/subvolume to have | nocow, which has the effect of only doing a COW operation | when you are creating snapshots. | mustache_kimono wrote: | And that may work for btrfs, but again at some cost: | | "When you enable nocow on your files, Btrfs cannot | compute checksums, meaning the integrity against bitrot | and other corruptions cannot be guaranteed (i.e. in nocow | mode, Btrfs drops to similar data consistency guarantees | as other popular filesystems, like ext4, XFS, ...). In | RAID modes, Btrfs cannot determine which mirror has the | good copy if there is corruption on one of them."[0] | | [0]: https://wiki.tnonline.net/w/Blog/SQLite_Performance_ | on_Btrfs... | lazide wrote: | Yup. It's a pretty fundamental thing. COW and data | checksums (and usually automatic/inline compression) co- | exist that way because it's otherwise too expensive | performance wise, and potentially dangerous corruption | wise. | | For instance, if you modify a single byte in a large | file, you need to update the data on disk as well as the | checksum in the block header, and other related data. | Chances are, these are in different sectors, and also | require re-reading in all the other data in the block to | compute the checksum. Anywhere in that process is a | chance for corruption of the original data and the | update. | | If the byte changes the final compressed size, it may not | fit in the current block at all, causing an expensive (or | impossible) re-allocation. | | You could end up with the original data and update both | invalid. | | Writing out a new COW block is done all at once, and if | it fails, the write failed atomically, with the original | data still intact. | tjoff wrote: | > _Chances are, these are in different sectors, and also | require re-reading in all the other data in the block to | compute the checksum. Anywhere in that process is a | chance for corruption of the original data and the | update._ | | Not much different than any interrupted write though. And | a COW needs to reread just as much. | | > _If the byte changes the final compressed size, it may | not fit in the current block at all, causing an expensive | (or impossible) re-allocation._ | | Something that you must always pay in a COW filesystem | anyway? Is handled by other non-COW filesystems anyway. | | Just because a filesystem isn't COW doesn't mean every | change needs to be in place either. Of course, a | filesystem that is primarily COW might not want to | maintain compression for non-COW edge-cases and that is | quite reasonable. | Arnavion wrote: | While filesystem-integrated RAID makes sense since the | filesystem can do filesystem-specific RAID placements (eg | zfs), for now the safest RAID experience seems to be | filesystem on mdadm on dm-integrity on disk partition, so | that the RAID and RAID errors are invisible to the | filesystem. | mustache_kimono wrote: | > the safest RAID experience seems to be filesystem on | mdadm on dm-integrity on disk partition, so that the RAID | and RAID errors are invisible to the filesystem. | | I suppose I don't understand this. Why would this be the | case? | Arnavion wrote: | dm-integrity solves the problem of identifying which | replica is good and which is bad. mdadm solves the | problem of reading from the replica identified as good | and fixing / reporting the replica identified as bad. The | filesystem doesn't notice or care. | mustache_kimono wrote: | Ahh, so you intend, "If you can't use ZFS/btrfs, use dm- | integrity"? | Arnavion wrote: | No. I don't use ZFS since it's not licensed correctly, so | I have no opinion on it. And BTRFS raid is not safe | enough for use. So I'm saying "Use filesystem on mdadm on | dm-integrity". | llanowarelves wrote: | I have been spinning my wheels on personal backups and file | organization the last few months. It is tough to perfectly | structure it. | | I think directories or volumes having different properties and | you having it split up as /consumer-media /work-media /work | /docs /credentials etc may be the way to go. | | Then you can set integrity, encryption etc separately, either | at filesystem level or as part of the software-level backup | strategy. | lazide wrote: | Why is it 'wasted'? Those things are mostly free on modern | hardware. | | The challenge with your thesis here is that the only one who | can know what is 'that important' is _YOU_ , and your decision | making and communication bandwidth is already the limiting | factor. | | For many users, that cat video would be heartbreaking to lose, | and they don't have term papers to worry about. | | So having to decide or think what is or is not 'important | enough' to you, and communicate that to the system, just makes | everything slower than putting everything on a system good | enough to protect the most sensitive and high value data you | have. | didgetmaster wrote: | Nothing is free or even 'mostly free' when managing data. | Data security (encryption), redundancy (backups), and | integrity (checksums, etc.) all impose a cost on the system. | | Getting each piece of data properly classified will always be | a challenge (AI or other tools may help with that), but it | would still be nice to be able to do it. If I have a 50GB | video file that I could easily re-download off the Internet, | it would be nice to be able to turn off any security, | redundancy, or integrity features for it. | | I wonder how many petabytes of storage space is being wasted | by having multiple backups of all the operating system files | that could be easily downloaded from multiple websites. Do I | really need to encrypt that GB file that 10 million people | also have a copy of? Am I worried if a single pixel in that | high resolution photo has changed due to bit rot? | Arnavion wrote: | >Do I really need to encrypt that GB file that 10 million | people also have a copy of? | | Indeed you don't. Poettering has a similar idea in [1] | (scroll down to "Summary of Resources and their | Protections" for the tl;dr table), where he imagines OS | files are only protected by dm-verity (for Silverblue-style | immutable distros) / dm-integrity (for regular mutable | distros). | | [1]: https://0pointer.net/blog/authenticated-boot-and-disk- | encryp... | derefr wrote: | > For many users, that cat video would be heartbreaking to | lose, and they don't have term papers to worry about. | | Depends on where that cat video is / how it ended up on the | disk. | | The user explicitly saved it to their user-profile Downloads | directory? Yeah, sure, the user might care a lot about | preserving that data. There's intent there. | | The user's web browser _implicitly_ saved it into the browser | 's cache directory? No, the user absolutely doesn't care. | That directory is a pure transparent optimization over just | loading the resource from the URL again; and the browser | makes no guarantees of anything in it surviving for even a | few minutes. The user doesn't even _know_ they have the data; | only the browser does. As such, the browser should be able to | tell the filesystem that this data is discardable cache data, | and the filesystem should be able to apply different storage | policies based on that. | | This is already true of managed cache/spool/tmp directories | vis-a-vis higher-level components of the OS. macOS, for | example, knows that stuff that's under ~/Library/Caches can | be purged when disk space is tight, so it counts it as | "reclaimable space"; and in some cases (caches that use | CoreData) the OS can even garbage-collect them itself. | | So, why not also avoid making these files a part of backups? | Why not avoid checksumming them? Etc. | lazide wrote: | Backups - possibly, but no one I know counts COW/Snapshots, | etc. as backups. Backup software generally already avoids | copying those. | | They can be ways to restore to a point in time | deterministically - but then they are absolutely needed to | do so! Otherwise, the software is going to be acting | differently with a bunch of data gone from underneath it, | no? | | Check summing is more about being able to detect errors | (and deterministically know if data corruption is | occurring). So yes, absolutely temporary and cache files | should be checksummed. If that data is corrupted, it will | cause crashes of the software using them and downstream | corruption after all. | | Why would I _not_ want that to get caught before my | software crashes or my output document (for instance) is | being silently corrupted because one of the temporary files | used when editing it got corrupted to /from disk? | derefr wrote: | > So yes, absolutely temporary and cache files should be | checksummed. If that data is corrupted, it will cause | crashes of the software using them and downstream | corruption after all. | | ...no? I don't care if a video in my browser's cache ends | up with a few corrupt blocks when I play it again a year | later. Video codecs are designed to be tolerant of that. | You'll get a glitchy section in a few frames, and then | hit the next keyframe and everything will clean up. | | In fact, _most_ encodings -- of images, audio, even text | -- are designed to be self-synchronizing in the face of | corruption. | | I think you're thinking specifically of _working-state_ | files, which usually _need_ to be perfect and guaranteed- | trusted, because they 're in normalized low-redundancy | forms and are also used to derive other data from. | | But when I say "caching", I'm talking about cached | _final-form assets_ intended for direct human | consumption. These get corrupted all the time, from | network errors during download, disk storage errors on | NASes, etc; and people mostly just don 't care. For | video, they just watch past it. For a web page, they | hard-refresh it and everything's fine the second time | around. | | If you think it's impossible to differentiate these two | cases: well, that's because we don't explicitly ask | developers to differentiate them. There could be separate | ~/Library/ViewCache and ~/Library/StateCache directories. | | And before you ask, a good example of a large "ViewCache" | asset that's _not_ browser-related: a video-editor | render-preview video file (the low-quality / thumbnail- | sized kind, used for scrubbing.) | lazide wrote: | If they are corrupted _on disk_ the behavior is not so | deterministic as a 'broken image' and a reload. Corrupted | _on disk_ content causes software crashes, hangs, and | other broken behavior users definitely don't like. | Especially when it's the filesystem metadata which gets | corrupted. | | Because _merely trying to read it_ can cause severe | issues at the filesystem level. | | I take it you haven't dealt with failing storage much | before? | derefr wrote: | I maintain database and object-storage clusters for a | living. Dealing with failing storage is half my job. | | > Especially when it's the filesystem metadata which gets | corrupted. | | We're not talking about filesystem metadata, though. | Filesystem metadata is all "of a piece" -- if you have a | checksumming filesystem, then you can't _not_ checksum | some of the filesystem metadata, because all the metadata | lives in (the moral equivalent of) a single database file | the filesystem maintains, and _that database_ gets | checksummed. It 's all one data structure, where the | checksumming is a thing you do _to_ that data structure, | not to individual nodes within it. (For a tree filesystem | like btrfs, this would be the non-cryptographic | equivalent of a merkle-tree hash.) The only way you could | even potentially turn off filesystem features for some | metadata (dirent, freelist, etc) nodes but not others, | would be to split your filesystem into multiple | filesystems. | | No, to be clear, we're specifically talking about what | happens inside the filesystem's _extents_. _Those_ can | experience corruption without that causing any undue | issues, besides "the data you get from fread(3) is | wrong." Unlike filesystem metadata, which is _all_ | required for the filesystem 's _integrity_ , a | checksumming filesystem can _choose_ whether to "look" | inside file extents, or to treat them as opaque. And it | can (in theory) make that choice per file, if it likes. | From the FS's perspective, an extent is just a range of | reserved disk blocks. | | Now, an assumption: only storage _arrays_ use spinning | rust for anything any more. The only disk problems | _consumer devices_ face any more are SSD degradation | problems, not HDD degradation problems. | | (Even if you don't agree with this assumption by itself, | it's much more clear-cut if you consider only devices | operated by people willing to choose to use a filesystem | that's not the default one for their OS.) | | This assumption neatly cleaves the problem-space in two: | | - How should a filesystem _on a RAID array, set up for a | business or prosumer use-case,_ deal with HDD faults? | | - How should a _single-device_ filesystem _used in a | consumer use-case_ deal with SDD faults? | | The HDD-faults case comes down to: filesystem-level | storage pool management with filesystem-driven redundant | reads, with kernel blocking-read timeouts to avoid hangs, | with async bad-sector remapping for timed out reads. | Y'know: ZFS. | | While the SDD-faults case comes down to: read the bad | data. Deal with the bad data. You won't get any hangs, | until the day the whole thing just stops working. The | worst you'll get is bit-rot. And even then, it's rare, | because NAND controllers use internal space for error- | correction, entirely invisibly to the kernel. (See also: | http://dtrace.org/blogs/ahl/2016/06/19/apfs-part5/) | | In fact, in my own personal experience, the most likely | cause of incorrect or corrupt data ending up on an | SSD/NVMe disk, is that the _CPU or memory_ of the system | is bad, and so one or the other is corrupting the memory | that will be written to disk _before_ or _during_ the | write. (I 've personally had this happen at least twice. | What to look for to diagnose this: PCIe "link training" | errors.) | rodgerd wrote: | > Does NILFS do checksums and snapshotting for every single | file in the system? | | NILFS is, by default, a filesystem that only ever appends until | you garbage collect the tail. It doesn't really "snapshot" in | the way that ZFS or btrfs do, because you can just walk the | entire history of the filesystem until you run out of history. | The snapshots are just bookmarks of a consistent state. | heavyset_go wrote: | You can turn off CoW, checksumming, compression, etc at the | file and directory levels using btrfs. | Arnavion wrote: | Indeed. You can also make a directory into a subvolume so | that that directory is not included in snapshots of the | parent volume. | spookthesunset wrote: | It might sound weird but the hard part of what you describe is | not the technology but how to design the UX in a way that you | aren't babysitting everything. | | And doing that is not at all easy. For all anybody knows your | cat video is "worth more" to you than your thesis paper. How | can you get the system to determine the worth of each file | without manually setting an attribute each time you create a | file? And if you let the system guess, the cost of failure | could be very high! What if it decided your thesis paper was | worthless and stored it will a lower "integrity" (or whatever | you call the metric)? | | I dunno. Storage is getting cheaper all the time and it might | just be easier to fuck it and treat all files with the same | high level of integrity. Maybe it would be so much work for a | user to manually manage they'd just mark everything the same? | didgetmaster wrote: | You could always set the default behavior to be uniform for | all files (e.g. protect everything or protect nothing) and | just forget about it. But it would be nice to be able to | manually set the protection level for specific files that are | the exception. | | If I was copying an important file into an unprotected | environment, I could change how it was handled (likewise if I | was downloading some huge video I didn't care about into a | system where the default protection was set to high). | | I agree that if you have 100 million files, then it could be | nearly impossible to classify every single one of them | correctly. | spookthesunset wrote: | I'd think on a directory basis would be the ideal | nintendo1889 wrote: | A directory basis, or even better, a numerical priority | that could be manually set in the application that | generated them, or automatically, based on the user or | application or in a hypervisor, based on the VM. Then it | could be an opportunistic setting. | | I thought ZFS had some sort of unique settings like this. | koolba wrote: | How does this compare to ZFS + cron to create snapshots every X | minutes? | harvie wrote: | Week ago my client lost data on ZFS by accidentaly deleting | folder. Unfortunately the data was created and deleted in the | meantime between two snapshots. One would expect that it still | might be possible to recover, because ZFS is CoW. | | There are some solutions like photorec (which now has ZFS | support), but it expects you can identify the file by footprint | of its contents, which was not the case. Also many of these | solutions would require ZFS to go offline for forensic analysis | and that was also not possible because lots of other clients | were using the same pool at the time. | | So this had failed me and i really wished at the time that ZFS | had continuous snapshots. | | BTW on ZFS i use ZnapZend. It's second best thing after | continuous snapshots: | | https://www.znapzend.org/ https://github.com/oetiker/znapzend/ | | There are also some ZFS snapshotting daemons in Debian, but | this is much more elegant and flexible. | | But since znapzend is userspace daemon (as are all ZFS | snapshoters) you need some kind of monitoring and warning | mechanism for cases something goes wrong and it can't longer | create snapshots (crashes, gets killed by OOM or something...). | In NILFS2 every write/delete is snapshot, so you are basicaly | guaranteed by kernel to have everything snapshoted without | having to watch it. | yonrg wrote: | I run this setup. zfs + zfsnap (not cron anymore, now | systemd.timer). | | I cannot tell if NILFS is doing this too, with zfsnap I | maintain different retention times. 5-minutely for 1hour, | hourly for 1day, daily for a week. That are less than 60 | snapshots. The older ones are cleaned up. | | In addition, zfs brings compression and encryption. That's why | I have it on the laptops, too. | goodpoint wrote: | There is no comparison. NILFS provides *continuous* snaphots, | so you can inspect and rollback changes as needed. | | It does without a performance penalty compared to other logging | filesystems. | | And without using additional space forever. The backlog rotates | forward continuously. | | It's a really unique feature that makes a lot of sense for | desktop use, where you might want to recover files that were | created and deleted after a short time. | harvie wrote: | Perhaps we can leverage "inotify" API to make ZFS snapshot | everytime some file had been changed... But i think ZFS is | not really good at handling huge amounts of snapshots. The | NILFS2 snapshots are probably more lightweight when compared | to ZFS ones. | goodpoint wrote: | The NILFS snapshots are practically free (for a logging | filesystem, obviously). | mustache_kimono wrote: | > Perhaps we can leverage "inotify" API to make ZFS | snapshot everytime some file had been changed... | | ZFS and btrfs users are already living in the future: | inotifywait -r -m --format %w%f -e close_write | "/srv/downloads/" | while read -r line; do # | command below will snapshot the dataset # upon | which the closed file is located sudo httm --snap | "$line" done | | See: https://kimono-koans.github.io/inotifywait/ | [deleted] | fuckstick wrote: | > It does without a performance penalty. | | What is the basis for comparison? Sounds like a pretty | meaningless statement at its face. | goodpoint wrote: | Compared to other logging filesystems obviously. | fuckstick wrote: | Nilfs baseline (write throughput especially) is slow as | shit compared to other filesystems including f2fs. So | just because you have this feature that doesn't make it | even slower isn't that interesting - you pay for it one | way or the other. | usr1106 wrote: | For many users filesystem speed of your home directory is | completely irrelevant unless you run on a Raspberry Pi | using SD cards. You just don't notice it. | | Of course if you haver server handling let's say video | files things will be very different. And there are some | users who process huge amounts of data. | | I run 2 lvm snapshots (daily and weekly) on my home | partition for years. Write performance is abysmal if you | measure it, but you don't note it in daily development | work. | [deleted] | [deleted] | 1MachineElf wrote: | >It's a really unique feature that makes a lot of sense for | desktop us | | Sounds like it could serve as a basis for a Linux | implementation of something like Apple Time Machine. | [deleted] | mustache_kimono wrote: | With 'httm', a few of us are already living in that bright | future: https://github.com/kimono-koans/httm | masklinn wrote: | Afaik Time Machine does not do continuous snapshots, just | periodic (and triggered). | | So you can already do that with zfs: take a snapshot and | send it to the backup drive. | harvie wrote: | "It does without a performance penalty" | | yeah. it's already so terribly slow that it's unlikely that | taking snapshots can make it any slower :-D | Volundr wrote: | That was not my experience with NILFS. It outperformed ext4 | on my laptop NVME. | akvadrako wrote: | The benchmarks here look pretty bad: | | https://www.phoronix.com/review/linux-58-filesystems/4 | Volundr wrote: | The last page looks pretty bad. If you look at the others | it's more of a mixed bag, but yeah. | | I don't remember what benchmark I ran before deciding to | run it on my laptop. Given my work at the time probably | pgbench, but I couldn't say for sure. It was long enough | ago I also might've been benchmarking against ext3, not | 4. | harvie wrote: | i think i was running it on 6TB conventional HDD RAID1. | also note that the read and write speeds might be quite | asymetrical... in general also depends on workload type. | pkulak wrote: | > There is no comparison. | | What if I compare it to BTRFS + Snapper? No performance | penalty there, plus checksumming. | AshamedCaptain wrote: | btrfs and snapperd do have a performance penalty as the | number of snapshots increases. Having 100+ usually means | snapper list will take north of an hour. You can easily | reach these numbers if you are taking a snapshot every | handful of minutes. | | Even background snapper cleanups will start to take a toll, | since even if they are done with ionice they tend to block | simultaneous accesses to the filesystem while they are in | progress. If you have your root on the same filesystem, | it's not pretty -- lots of periodic system-wide freezes | with the HDD LEDs non-stop blinking. I tend to limit | snapshots always to < 20 for that reason (and so does the | default snapperd config). | mike256 wrote: | About 2 years ago I believed the same. Then I used BTRFS as | a store for VM images (with periodoc snapshot) and | performance went down to really really bad. After I deleted | all snapshots performance was good again. There is a big | performance penalty in btrfs with more than about 100 | snapshots. | Volundr wrote: | NILFS is really, really cool. In concept. Unfortunately the | tooling and support just isn't there. I ran it for quite some | time on my laptop and the continuous snapshoting is everything I | hoped it'd be. At one point however there was a change to the | kernel that rendered it unbootable. Despite being a known and | recorded bug it took forever to get fixed (about a year if I | recall correctly) leaving me stuck on an old kernel the whole | time. | | This was made more frustrating by the lack of any tooling such as | fsck to help me diagnose the issue. The only reason I figured out | it was a bug was that I booted a live CD to try to rescue the | system and it booted fine. | | When I finally replaced that laptop I went back to ZFS and | scripted snapshots. As much as I want to, I just can't recommend | NILFS for daily use. | yonrg wrote: | Do you happen to remember which change in kernel was the cause? | | I had troubles with un-popular file systems as root file system | when the initrd was not built properly. So sysresccd is always | good to have in reach.. Saying this, I think I won't have any | other file system on root besides the default of the distro. | Data which require special care are on other partitions. | CGamesPlay wrote: | How did Linus not go on a rampage after breaking userspace for | an entire year? Is NILFS not part of the kernel mainline, I | guess? | jraph wrote: | If I understand correctly, I don't think this is a userspace- | breaking bug, as in: a kernel API changed and made a | userspace program not work anymore. | | It is a bug that prevents the kernel from booting. That's | bad, but that's not the same thing. That's not a userspace | compatibility issue such as the ones Linus chases. The user | space isn't even involved if the kernel cannot boot. Or if it | is actually a userspace program that causes a kernel crash, | it is a crash, which is not really the same thing as an API | change (one could argue, but that's a bit far-fetched, the | intents are not the same, etc - I don't see Linus explode on | somebody who introduced a crash the way he would explode on | someone changing a userspace API). | yjftsjthsd-h wrote: | > Is NILFS not part of the kernel mainline, I guess? | | Good guess, but no: | | https://github.com/torvalds/linux/tree/master/fs/nilfs2 | | > How did Linus not go on a rampage after breaking userspace | for an entire year? | | I would very much like to know that as well. Any chance it | didn't get reported (at least, not as "this broke booting")? | Volundr wrote: | I reported it along with a few other users in | https://marc.info/?l=linux-nilfs&m=157540765215806&w=2. I | think it just isn't widely enough used that Linus noticed | we were broken. If I recall correctly it also wasn't | directly fixed so much as incidentally. I just kept | checking new kernel versions as they were released until | one worked. There was never anything in the change-log | (that I recall) about fixing the bug, just another change | that happened to fix the issue. | | Edit: Looking through the archives, it looks like my memory | was somewhat uncharitable. It was reported in November and | directly patched in June (https://marc.info/?l=linux- | nilfs&m=159154670627428&w=2) so about 7 months after | reporting. Not sure what kernel release that would've | landed in, so could've been closer to 8. | bityard wrote: | > How did Linus not go on a rampage after breaking userspace | for an entire year? | | Linus' commandment about not breaking userspace is frequently | misunderstood. He wants to ensure that user-space /programs/ | do not break (even if they rely on buggy behavior that made | it into a release), not that the /user/ will never see any | breakage of the system whatsoever, which is of course an | impossible goal. Device drivers and filesystems are firmly | system-level stuff, bugs and backwards-incompatible changes | in those areas are regrettable but happen all the same. | cmurf wrote: | Very nice introduction to NILSFS, which has been in the Linux | kernel since 2009. | newcup wrote: | I think NILFS is a hidden gem. I've been using it exclusively in | my Linux laptops, desktops etc. since ca. 2014. Apart from one | kernel regression bug related to NILFS2 it's worked flawlessly | (no data corruption even with the bug just no access to the file | system; effectively it forced running older kernel while the bug | was fixed). | | The continuous snapshotting has saved me a couple of times; I've | just mounted a version of the file system from few hours or weeks | ago to access overwritten or deleted data. I use NILFS also on | backup disks to provide combined deduplication and snapshots | easily (just rsync & NILFS' mkss, latter to make sure the | "checkpoints" aren't unnoticedly garbage collected in case the | backup disk gets full). | nix23 wrote: | >I think NILFS is a hidden gem. I've been using it exclusively | in my Linux laptops, desktops etc. since ca. 2014 | | Yes it's really sad, there we have a native and stable check- | summing fs, and nearly no one knows about it. | yjftsjthsd-h wrote: | > check-summing fs | | Is it? Last I'd heard was | | > nilfs2 store checksums for all data. However, at least the | current implementation does not verify it when reading. | | https://www.spinics.net/lists/linux-nilfs/msg01063.html | nix23 wrote: | Hmm you could be right, i found nothing about that it is | calculated at read-time. Just with fsck. | conradev wrote: | BTRFS is also a native copy on write filesystem that verifies | a configurable checksum and supports snapshots. | | The snapshots are not automatic, but short of that it is | pretty feature complete | nix23 wrote: | That's why i specifically wrote -> stable... | 77pt77 wrote: | BTRFS is not stable? | guipsp wrote: | BTRFS is pretty stable nowadays. | nerpderp82 wrote: | What does that mean quantifiably? | guipsp wrote: | Synology deploys it in their products | ComputerGuru wrote: | > Apart from one kernel regression bug related to NILFS2 it's | worked flawlessly | | Maybe on x86? I've tried repeatedly to use it on ARM for | RaspberryPi where it would have been perfect, but always ran | into various kernel panics as soon as the file system is | mounted or accessed. | heavyset_go wrote: | I've used NILFS2 on flash storage on some old non-RPi ARMv7 | hardware for a while without a problem. Switched to F2FS for | performance reasons, though. | newcup wrote: | True, I only have used it on x86 devices. Thanks for the | heads up! | | I've heard so many stories of SD card failures (against which | snapshotting might be of no help) with RaspberryPi that I've | decided to send any valuable data promptly to safety over a | network. (Though, I personally haven't had any problems with | failing SD's.) | rodgerd wrote: | NILFS is absolutely wonderful; it was very unfortunate that | Linus chose to dub btrfs as the ext4 successor all those years | ago, because it cut off a lot of interest in the plethora of | interesting work that was going on at the time. | | A decade later and btrfs is still riddled with problems and | incomplete, people are still using xfs and ext4 for lack of | trust, one kernel dev has a side hobby trying to block openzfs, | and excellent little projects like nilfs are largely unknown. | perrygeo wrote: | > one kernel dev has a side hobby trying to block openzfs | | Can you elaborate? | nintendo1889 wrote: | I remember DEC/HP releasing the source to the digital unix AdvFS | filesystem on sourceforge with the intent of porting it over to | linux, but it never materialized. AdvFS had many advanced | features. The source is still available and within it are some | PDF slides that explain a lot of it's features. | Nifty3929 wrote: | Do any file systems have good, native support for tagging and | complex searched based on those tags? | DannyBee wrote: | BeFS was the last real one i'm aware of at the complexity you | are talking about (plenty of FSen have some very basic indexed | support for say file sizes , but not the kind of generic | tagging you are talking about) | | At this point, the view seems to be "attributes happen in the | file system, indexing happens in user space". | | Especially on linux. | | Part of the reason is, as i understand it, the | surface/complexity of including query languages in the kernel, | which is not horribly unreasonable | | So all the common FSen have reasonable xattr support, and | inotify/etc that support notification of attribute changes. | | The expectation seems to be that the fact that inotify might | drop events now and then is not a dealbreaker. The modern queue | length is usually 16384 anyway. | | I'm not saying there aren't tradeoffs here, but this seems to | be the direction taken overall. | | I actually would love to have an FS with native indexed xattr | and a way to get at them. | | I just don't think we'll get back there again anytime soon. | Nifty3929 wrote: | Okay - how about tagging and non-complex searches then. | Beggars can't be choosers :-) | | Really what I'd like is just to search for some specific | tags, or maybe list a directory excluding some tag, or | similar. For bonus points, maybe a virtual directory that | represents a search like this, and which "contains" the | results of that search. (A "Search Folder") | | I'll check out BeFS. Thanks! | harvie wrote: | I had issues with file locking when running some legacy database | software on NILFS2. Probably caused data corruption in that | database (not the FS itself). | | SF website of NILFS2 suggests that there are some unimplemented | features, one of them being synchronous IO, which might have | caused that issue? | | https://nilfs.sourceforge.io/en/current_status.html | | In some cases, the NILFS2 is safer storage for your data than | ZFS. So NILFS might work for some simple usecases (eg. localy | storing documents that you modify often), but it's certainly not | ready to be deployed as generic filesystem. It's relatively slow | and sometimes behaves bit weird. If something goes really bad, | the recovery might be bit painfull. There is no fsck yet, nor | community support. NILFS2 can self-heal itself to some extent. | | I really like the idea of NILFS2 but at this point i would prefer | patch adding continuous snapshotting to ZFS. Unlike NILFS2 the | ZFS have lots of active developers and big community. While | NILFS2 is almost dead. The fact it's been in kernel for quite | some time and most people didn't even noticed it (despite it's | very interresting features) speaks for itself. | | Don't get me wrong. I wish that more developers get interested in | NILFS2 and fix these issues and make it on par with EXT4, XFS and | ZFS... But still ZFS has more features overall, so we might just | add continuous snapshots in memoriam of NILFS2. | yjftsjthsd-h wrote: | > In some cases, the NILFS2 is safer storage for your data than | ZFS. | | What cases? Do you just mean due to continuous snapshots | protecting against accidental deletes or such, or are there | more "under the covers" things it fixes? | ComputerGuru wrote: | It's basically append-only for recent things so you | theoretically you can't lose anything (within a reasonable | timeframe). I don't know if the porcelain exposes everything | you need to avail yourself of that design functionality, | though. | compsciphd wrote: | we used NILFS 15 years ago in dejaview - | https://www.cs.columbia.edu/~nieh/pubs/sosp2007_dejaview.pdf | | We combined nilfs + our process snapshotting tech (we tried to | mainline it, but it didn't go, but many of the concepts ended up | in CRIU though) + our remote display + screen reading tech (i.e. | normal APIs) to create an environment that could record | everything you ever saw visually and textually. enable you to | search it and enable you to recreate the state as it was at that | time with non noticeable interruption to the user (processes | downtime was like 0.02s). | heavyset_go wrote: | This is cool, thanks for sharing it. ___________________________________________________________________ (page generated 2022-10-11 23:00 UTC)