[HN Gopher] Linux NILFS file system: automatic continuous snapshots
       ___________________________________________________________________
        
       Linux NILFS file system: automatic continuous snapshots
        
       Author : solene
       Score  : 191 points
       Date   : 2022-10-11 11:58 UTC (11 hours ago)
        
 (HTM) web link (dataswamp.org)
 (TXT) w3m dump (dataswamp.org)
        
       | wazoox wrote:
       | I've been running NILFS2 on my main work NAS for 8 years. It
       | never failed us :)
        
         | mdaniel wrote:
         | I mean this honestly: how did you evaluate such a new
         | filesystem in order to bet a work NAS upon it?
        
           | wazoox wrote:
           | I've made some testing, and installed it on a secondary
           | system that in the beginning mostly hosted unimportant files.
           | Then we added more things, and as after a few years it posed
           | absolutely no problem we went further (and added a backup
           | procedure). Then we migrated to new hardware, and it's still
           | going strong (it's quite small, about 15 TB volume).
        
           | yonrg wrote:
           | I would do it by using it! ... and probably some backup
        
       | remram wrote:
       | How is this pronounced? Nil-F-S? Nilfuss? Nai-L-F-S? N-I-L-F-S?
        
         | heavyset_go wrote:
         | The first one.
        
       | Rygian wrote:
       | How close is this to a large continuous tape loop for video
       | surveillance?
       | 
       | I would very much welcome a filesystem that breaks away from the
       | directories/files paradigm. Any time-based data store would
       | greatly benefit from that.
        
         | rcthompson wrote:
         | I think all you would need to add is a daemon that
         | automatically deletes the oldest file(s) whenever free space
         | drops below a certain threshold, so that the filesystem GC can
         | reclaim that space for new files.
        
           | tommiegannert wrote:
           | If NILFS is continuously checkpointing, couldn't you even
           | remove the file right after you add it, for simplicity?
        
           | Rygian wrote:
           | I know and use 'logrotate'.
           | 
           | My point was more on the tracks of a filesystem where a
           | single file can be overwritten over and over again, and it's
           | up to the filesystem to transparently ensure the full
           | capacity of the disk is put towards retaining old versions of
           | the file.
        
             | nix23 wrote:
             | Hmm maybe something like Bluestore?
             | 
             | https://docs.ceph.com/en/latest/rados/configuration/storage
             | -...
        
               | Rygian wrote:
               | I definitely need to dive into Ceph, thanks for the
               | pointer :-)
        
       | darau1 wrote:
       | What's the difference between a snapshot, and a checkpoint?
        
         | okasaki wrote:
         | from TA:
         | 
         | > A checkpoint is a snapshot of your system at a given point in
         | time, but it can be deleted automatically if some disk space
         | must be reclaimed. A checkpoint can be transformed into a
         | snapshot that will never be removed.
        
       | sargun wrote:
       | I've always wondered why NILFS (or similar) isn't used for cases
       | where ransomware is a risk. I'm honestly surprised that it's not
       | mandated to use a append-only / log-structured filesystem for
       | some critical systems (think patient records), where the cost of
       | losing data is so high, rarely mutated, and trading it off for
       | wasting storage isn't that bad (after all, HDD storage is
       | incredibly cheap, and nobody said you had to keep the working set
       | and the log on the same device).
        
         | compsciphd wrote:
         | you don't need a log structured fs to do this, you could just
         | have regular zfs/btrfs snapshots too.
         | 
         | BUT
         | 
         | if an attack has the ability to delete an entire file system /
         | encrypt it, they really have the ability to delete the
         | snapshots as well, the only reason they might not is due to
         | "security through obscurity".
         | 
         | now, what I have argued is that an append only file system
         | which works in a SAN like environment (i.e. you have random
         | reads, but only append writes properties that are enforced
         | remotely) could give you that, but to an extent you'd still get
         | a similar behavior by just exporting ZFS shares (or even as
         | block devices) and snapshotting them regularly on the remote
         | end.
        
           | ephbit wrote:
           | > if an attack has the ability to delete an entire file
           | system / encrypt it, they really have the ability to delete
           | the snapshots as well, ..
           | 
           | How so?
           | 
           | Let's say you have one machine holding the actual data for
           | working on it. And some backup server. You could use btrfs
           | send over ssh and regularly btrfs receive the data on the
           | backup machine. Even it they got encrypted by ransomware they
           | wouldn't be lost in the backups. As long as they're not
           | deleted there how could a compromised work machine compromise
           | the data on the backup machine?
        
       | ggm wrote:
       | Didn't VMS have this baked in? My memory is that all 8.3 file
       | names had 8.3[;nnn] version tagging under the hood
        
         | usr1106 wrote:
         | That's what it looked like, but I doubt it was deep in the
         | filesystem. It was basically just a naming convention. User had
         | to purge old versions manually. This gets tedious if you have
         | many files that change often. Snapshots are a safety net, not
         | something you want to have in your way all day long.
        
           | ggm wrote:
           | Er.. my memory is that it did COW inside VMS fs semantics and
           | was not manually achived. You did have to manually delete. So
           | I don't think it was just a hack.
           | 
           | It didn't do directories so was certainly not as good as
           | snapshot but we're talking 40 years ago!
        
       | jerf wrote:
       | What happens if you run "dd if=/dev/zero of=/any/file/here", thus
       | simply loading the disk with all the zeros it can handle? Do you
       | lose all your snapshots as they are deleted to make room, or does
       | it keep some space aside for this situation?
       | 
       | (Not a "gotcha" question, a legitimate question.)
        
         | regularfry wrote:
         | I know this isn't what you're getting at, but is it smart
         | enough to create a sparse file when you specifically pick zero
         | as your filler byte?
        
         | solene wrote:
         | the garbage collector daemon will delete older checkpoints
         | beyond the preserve time to make some room.
        
         | Volundr wrote:
         | It's configurable: https://nilfs.sourceforge.io/en/man5/nilfs_c
         | leanerd.conf.5.h.... Cleanerd is responsible for maintaining a
         | certain amount of free space on the system, and you can control
         | the rules for doing so (e.x. a checkpoint won't be eligible for
         | being cleaned until it is 1 week old).
         | 
         | It's also worth knowing NILFS2 has checkpoints and snapshots.
         | What you actually get are continuous "checkpoints". These can
         | be upgraded to snapshots at any time with a simple command.
         | Checkpoints are garbage collected, snapshots are not (until
         | they are downgraded back into checkpoints).
        
       | throwaway787544 wrote:
        
       | didgetmaster wrote:
       | Does NILFS do checksums and snapshotting for every single file in
       | the system? One of my biggest complaints about file systems in
       | general is that they are all designed to treat every file the
       | exact same way.
       | 
       | We now have storage systems (even SSDs) that are big enough to
       | hold hundreds of millions of files. Those files can be a mix of
       | small files, big files, temp files, personal files, and public
       | files. Yet every file system must treat your precious thesis
       | paper the same way it treats a huge cat video you downloaded off
       | the Internet.
       | 
       | We need some kind of 'object store' where each object can be
       | given a set of attributes that govern how the file system treats
       | it. Backup, encryption, COW, checksums, and other operations
       | should not be wasted on a bunch of data that no one really cares
       | about.
       | 
       | I have been working on a kind of object file system that
       | addresses this problem.
        
         | nix23 wrote:
         | Well you can do that kind of with zfs filesystems, and the
         | "object" is the recordsize.
        
           | mustache_kimono wrote:
           | I was going to ask: "Is there any limit on the number of ZFS
           | filesystems in a pool?" Google says 2^64 is the limit.
           | 
           | Couldn't one just just generate a filesystem per object if
           | snapshots, etc., on a per object level is what one cared
           | about? Wonder how quickly this would fall over?
           | 
           | > Backup, encryption, COW, checksums, and other operations
           | should not be wasted on a bunch of data that no one really
           | cares about.
           | 
           | This GP comment is a little goofy though. There was a user I
           | once encountered who wanted ZFS, but a la carte. "I want the
           | snapshots but I don't need COW." You have to explain, "You
           | don't get the snapshots unless you have the COW", etc.
        
             | Conan_Kudo wrote:
             | On Btrfs, you can mark a folder/file/subvolume to have
             | nocow, which has the effect of only doing a COW operation
             | when you are creating snapshots.
        
               | mustache_kimono wrote:
               | And that may work for btrfs, but again at some cost:
               | 
               | "When you enable nocow on your files, Btrfs cannot
               | compute checksums, meaning the integrity against bitrot
               | and other corruptions cannot be guaranteed (i.e. in nocow
               | mode, Btrfs drops to similar data consistency guarantees
               | as other popular filesystems, like ext4, XFS, ...). In
               | RAID modes, Btrfs cannot determine which mirror has the
               | good copy if there is corruption on one of them."[0]
               | 
               | [0]: https://wiki.tnonline.net/w/Blog/SQLite_Performance_
               | on_Btrfs...
        
               | lazide wrote:
               | Yup. It's a pretty fundamental thing. COW and data
               | checksums (and usually automatic/inline compression) co-
               | exist that way because it's otherwise too expensive
               | performance wise, and potentially dangerous corruption
               | wise.
               | 
               | For instance, if you modify a single byte in a large
               | file, you need to update the data on disk as well as the
               | checksum in the block header, and other related data.
               | Chances are, these are in different sectors, and also
               | require re-reading in all the other data in the block to
               | compute the checksum. Anywhere in that process is a
               | chance for corruption of the original data and the
               | update.
               | 
               | If the byte changes the final compressed size, it may not
               | fit in the current block at all, causing an expensive (or
               | impossible) re-allocation.
               | 
               | You could end up with the original data and update both
               | invalid.
               | 
               | Writing out a new COW block is done all at once, and if
               | it fails, the write failed atomically, with the original
               | data still intact.
        
               | tjoff wrote:
               | > _Chances are, these are in different sectors, and also
               | require re-reading in all the other data in the block to
               | compute the checksum. Anywhere in that process is a
               | chance for corruption of the original data and the
               | update._
               | 
               | Not much different than any interrupted write though. And
               | a COW needs to reread just as much.
               | 
               | > _If the byte changes the final compressed size, it may
               | not fit in the current block at all, causing an expensive
               | (or impossible) re-allocation._
               | 
               | Something that you must always pay in a COW filesystem
               | anyway? Is handled by other non-COW filesystems anyway.
               | 
               | Just because a filesystem isn't COW doesn't mean every
               | change needs to be in place either. Of course, a
               | filesystem that is primarily COW might not want to
               | maintain compression for non-COW edge-cases and that is
               | quite reasonable.
        
               | Arnavion wrote:
               | While filesystem-integrated RAID makes sense since the
               | filesystem can do filesystem-specific RAID placements (eg
               | zfs), for now the safest RAID experience seems to be
               | filesystem on mdadm on dm-integrity on disk partition, so
               | that the RAID and RAID errors are invisible to the
               | filesystem.
        
               | mustache_kimono wrote:
               | > the safest RAID experience seems to be filesystem on
               | mdadm on dm-integrity on disk partition, so that the RAID
               | and RAID errors are invisible to the filesystem.
               | 
               | I suppose I don't understand this. Why would this be the
               | case?
        
               | Arnavion wrote:
               | dm-integrity solves the problem of identifying which
               | replica is good and which is bad. mdadm solves the
               | problem of reading from the replica identified as good
               | and fixing / reporting the replica identified as bad. The
               | filesystem doesn't notice or care.
        
               | mustache_kimono wrote:
               | Ahh, so you intend, "If you can't use ZFS/btrfs, use dm-
               | integrity"?
        
               | Arnavion wrote:
               | No. I don't use ZFS since it's not licensed correctly, so
               | I have no opinion on it. And BTRFS raid is not safe
               | enough for use. So I'm saying "Use filesystem on mdadm on
               | dm-integrity".
        
         | llanowarelves wrote:
         | I have been spinning my wheels on personal backups and file
         | organization the last few months. It is tough to perfectly
         | structure it.
         | 
         | I think directories or volumes having different properties and
         | you having it split up as /consumer-media /work-media /work
         | /docs /credentials etc may be the way to go.
         | 
         | Then you can set integrity, encryption etc separately, either
         | at filesystem level or as part of the software-level backup
         | strategy.
        
         | lazide wrote:
         | Why is it 'wasted'? Those things are mostly free on modern
         | hardware.
         | 
         | The challenge with your thesis here is that the only one who
         | can know what is 'that important' is _YOU_ , and your decision
         | making and communication bandwidth is already the limiting
         | factor.
         | 
         | For many users, that cat video would be heartbreaking to lose,
         | and they don't have term papers to worry about.
         | 
         | So having to decide or think what is or is not 'important
         | enough' to you, and communicate that to the system, just makes
         | everything slower than putting everything on a system good
         | enough to protect the most sensitive and high value data you
         | have.
        
           | didgetmaster wrote:
           | Nothing is free or even 'mostly free' when managing data.
           | Data security (encryption), redundancy (backups), and
           | integrity (checksums, etc.) all impose a cost on the system.
           | 
           | Getting each piece of data properly classified will always be
           | a challenge (AI or other tools may help with that), but it
           | would still be nice to be able to do it. If I have a 50GB
           | video file that I could easily re-download off the Internet,
           | it would be nice to be able to turn off any security,
           | redundancy, or integrity features for it.
           | 
           | I wonder how many petabytes of storage space is being wasted
           | by having multiple backups of all the operating system files
           | that could be easily downloaded from multiple websites. Do I
           | really need to encrypt that GB file that 10 million people
           | also have a copy of? Am I worried if a single pixel in that
           | high resolution photo has changed due to bit rot?
        
             | Arnavion wrote:
             | >Do I really need to encrypt that GB file that 10 million
             | people also have a copy of?
             | 
             | Indeed you don't. Poettering has a similar idea in [1]
             | (scroll down to "Summary of Resources and their
             | Protections" for the tl;dr table), where he imagines OS
             | files are only protected by dm-verity (for Silverblue-style
             | immutable distros) / dm-integrity (for regular mutable
             | distros).
             | 
             | [1]: https://0pointer.net/blog/authenticated-boot-and-disk-
             | encryp...
        
           | derefr wrote:
           | > For many users, that cat video would be heartbreaking to
           | lose, and they don't have term papers to worry about.
           | 
           | Depends on where that cat video is / how it ended up on the
           | disk.
           | 
           | The user explicitly saved it to their user-profile Downloads
           | directory? Yeah, sure, the user might care a lot about
           | preserving that data. There's intent there.
           | 
           | The user's web browser _implicitly_ saved it into the browser
           | 's cache directory? No, the user absolutely doesn't care.
           | That directory is a pure transparent optimization over just
           | loading the resource from the URL again; and the browser
           | makes no guarantees of anything in it surviving for even a
           | few minutes. The user doesn't even _know_ they have the data;
           | only the browser does. As such, the browser should be able to
           | tell the filesystem that this data is discardable cache data,
           | and the filesystem should be able to apply different storage
           | policies based on that.
           | 
           | This is already true of managed cache/spool/tmp directories
           | vis-a-vis higher-level components of the OS. macOS, for
           | example, knows that stuff that's under ~/Library/Caches can
           | be purged when disk space is tight, so it counts it as
           | "reclaimable space"; and in some cases (caches that use
           | CoreData) the OS can even garbage-collect them itself.
           | 
           | So, why not also avoid making these files a part of backups?
           | Why not avoid checksumming them? Etc.
        
             | lazide wrote:
             | Backups - possibly, but no one I know counts COW/Snapshots,
             | etc. as backups. Backup software generally already avoids
             | copying those.
             | 
             | They can be ways to restore to a point in time
             | deterministically - but then they are absolutely needed to
             | do so! Otherwise, the software is going to be acting
             | differently with a bunch of data gone from underneath it,
             | no?
             | 
             | Check summing is more about being able to detect errors
             | (and deterministically know if data corruption is
             | occurring). So yes, absolutely temporary and cache files
             | should be checksummed. If that data is corrupted, it will
             | cause crashes of the software using them and downstream
             | corruption after all.
             | 
             | Why would I _not_ want that to get caught before my
             | software crashes or my output document (for instance) is
             | being silently corrupted because one of the temporary files
             | used when editing it got corrupted to /from disk?
        
               | derefr wrote:
               | > So yes, absolutely temporary and cache files should be
               | checksummed. If that data is corrupted, it will cause
               | crashes of the software using them and downstream
               | corruption after all.
               | 
               | ...no? I don't care if a video in my browser's cache ends
               | up with a few corrupt blocks when I play it again a year
               | later. Video codecs are designed to be tolerant of that.
               | You'll get a glitchy section in a few frames, and then
               | hit the next keyframe and everything will clean up.
               | 
               | In fact, _most_ encodings -- of images, audio, even text
               | -- are designed to be self-synchronizing in the face of
               | corruption.
               | 
               | I think you're thinking specifically of _working-state_
               | files, which usually _need_ to be perfect and guaranteed-
               | trusted, because they 're in normalized low-redundancy
               | forms and are also used to derive other data from.
               | 
               | But when I say "caching", I'm talking about cached
               | _final-form assets_ intended for direct human
               | consumption. These get corrupted all the time, from
               | network errors during download, disk storage errors on
               | NASes, etc; and people mostly just don 't care. For
               | video, they just watch past it. For a web page, they
               | hard-refresh it and everything's fine the second time
               | around.
               | 
               | If you think it's impossible to differentiate these two
               | cases: well, that's because we don't explicitly ask
               | developers to differentiate them. There could be separate
               | ~/Library/ViewCache and ~/Library/StateCache directories.
               | 
               | And before you ask, a good example of a large "ViewCache"
               | asset that's _not_ browser-related: a video-editor
               | render-preview video file (the low-quality  / thumbnail-
               | sized kind, used for scrubbing.)
        
               | lazide wrote:
               | If they are corrupted _on disk_ the behavior is not so
               | deterministic as a 'broken image' and a reload. Corrupted
               | _on disk_ content causes software crashes, hangs, and
               | other broken behavior users definitely don't like.
               | Especially when it's the filesystem metadata which gets
               | corrupted.
               | 
               | Because _merely trying to read it_ can cause severe
               | issues at the filesystem level.
               | 
               | I take it you haven't dealt with failing storage much
               | before?
        
               | derefr wrote:
               | I maintain database and object-storage clusters for a
               | living. Dealing with failing storage is half my job.
               | 
               | > Especially when it's the filesystem metadata which gets
               | corrupted.
               | 
               | We're not talking about filesystem metadata, though.
               | Filesystem metadata is all "of a piece" -- if you have a
               | checksumming filesystem, then you can't _not_ checksum
               | some of the filesystem metadata, because all the metadata
               | lives in (the moral equivalent of) a single database file
               | the filesystem maintains, and _that database_ gets
               | checksummed. It 's all one data structure, where the
               | checksumming is a thing you do _to_ that data structure,
               | not to individual nodes within it. (For a tree filesystem
               | like btrfs, this would be the non-cryptographic
               | equivalent of a merkle-tree hash.) The only way you could
               | even potentially turn off filesystem features for some
               | metadata (dirent, freelist, etc) nodes but not others,
               | would be to split your filesystem into multiple
               | filesystems.
               | 
               | No, to be clear, we're specifically talking about what
               | happens inside the filesystem's _extents_. _Those_ can
               | experience corruption without that causing any undue
               | issues, besides  "the data you get from fread(3) is
               | wrong." Unlike filesystem metadata, which is _all_
               | required for the filesystem 's _integrity_ , a
               | checksumming filesystem can _choose_ whether to  "look"
               | inside file extents, or to treat them as opaque. And it
               | can (in theory) make that choice per file, if it likes.
               | From the FS's perspective, an extent is just a range of
               | reserved disk blocks.
               | 
               | Now, an assumption: only storage _arrays_ use spinning
               | rust for anything any more. The only disk problems
               | _consumer devices_ face any more are SSD degradation
               | problems, not HDD degradation problems.
               | 
               | (Even if you don't agree with this assumption by itself,
               | it's much more clear-cut if you consider only devices
               | operated by people willing to choose to use a filesystem
               | that's not the default one for their OS.)
               | 
               | This assumption neatly cleaves the problem-space in two:
               | 
               | - How should a filesystem _on a RAID array, set up for a
               | business or prosumer use-case,_ deal with HDD faults?
               | 
               | - How should a _single-device_ filesystem _used in a
               | consumer use-case_ deal with SDD faults?
               | 
               | The HDD-faults case comes down to: filesystem-level
               | storage pool management with filesystem-driven redundant
               | reads, with kernel blocking-read timeouts to avoid hangs,
               | with async bad-sector remapping for timed out reads.
               | Y'know: ZFS.
               | 
               | While the SDD-faults case comes down to: read the bad
               | data. Deal with the bad data. You won't get any hangs,
               | until the day the whole thing just stops working. The
               | worst you'll get is bit-rot. And even then, it's rare,
               | because NAND controllers use internal space for error-
               | correction, entirely invisibly to the kernel. (See also:
               | http://dtrace.org/blogs/ahl/2016/06/19/apfs-part5/)
               | 
               | In fact, in my own personal experience, the most likely
               | cause of incorrect or corrupt data ending up on an
               | SSD/NVMe disk, is that the _CPU or memory_ of the system
               | is bad, and so one or the other is corrupting the memory
               | that will be written to disk _before_ or _during_ the
               | write. (I 've personally had this happen at least twice.
               | What to look for to diagnose this: PCIe "link training"
               | errors.)
        
         | rodgerd wrote:
         | > Does NILFS do checksums and snapshotting for every single
         | file in the system?
         | 
         | NILFS is, by default, a filesystem that only ever appends until
         | you garbage collect the tail. It doesn't really "snapshot" in
         | the way that ZFS or btrfs do, because you can just walk the
         | entire history of the filesystem until you run out of history.
         | The snapshots are just bookmarks of a consistent state.
        
         | heavyset_go wrote:
         | You can turn off CoW, checksumming, compression, etc at the
         | file and directory levels using btrfs.
        
           | Arnavion wrote:
           | Indeed. You can also make a directory into a subvolume so
           | that that directory is not included in snapshots of the
           | parent volume.
        
         | spookthesunset wrote:
         | It might sound weird but the hard part of what you describe is
         | not the technology but how to design the UX in a way that you
         | aren't babysitting everything.
         | 
         | And doing that is not at all easy. For all anybody knows your
         | cat video is "worth more" to you than your thesis paper. How
         | can you get the system to determine the worth of each file
         | without manually setting an attribute each time you create a
         | file? And if you let the system guess, the cost of failure
         | could be very high! What if it decided your thesis paper was
         | worthless and stored it will a lower "integrity" (or whatever
         | you call the metric)?
         | 
         | I dunno. Storage is getting cheaper all the time and it might
         | just be easier to fuck it and treat all files with the same
         | high level of integrity. Maybe it would be so much work for a
         | user to manually manage they'd just mark everything the same?
        
           | didgetmaster wrote:
           | You could always set the default behavior to be uniform for
           | all files (e.g. protect everything or protect nothing) and
           | just forget about it. But it would be nice to be able to
           | manually set the protection level for specific files that are
           | the exception.
           | 
           | If I was copying an important file into an unprotected
           | environment, I could change how it was handled (likewise if I
           | was downloading some huge video I didn't care about into a
           | system where the default protection was set to high).
           | 
           | I agree that if you have 100 million files, then it could be
           | nearly impossible to classify every single one of them
           | correctly.
        
             | spookthesunset wrote:
             | I'd think on a directory basis would be the ideal
        
             | nintendo1889 wrote:
             | A directory basis, or even better, a numerical priority
             | that could be manually set in the application that
             | generated them, or automatically, based on the user or
             | application or in a hypervisor, based on the VM. Then it
             | could be an opportunistic setting.
             | 
             | I thought ZFS had some sort of unique settings like this.
        
       | koolba wrote:
       | How does this compare to ZFS + cron to create snapshots every X
       | minutes?
        
         | harvie wrote:
         | Week ago my client lost data on ZFS by accidentaly deleting
         | folder. Unfortunately the data was created and deleted in the
         | meantime between two snapshots. One would expect that it still
         | might be possible to recover, because ZFS is CoW.
         | 
         | There are some solutions like photorec (which now has ZFS
         | support), but it expects you can identify the file by footprint
         | of its contents, which was not the case. Also many of these
         | solutions would require ZFS to go offline for forensic analysis
         | and that was also not possible because lots of other clients
         | were using the same pool at the time.
         | 
         | So this had failed me and i really wished at the time that ZFS
         | had continuous snapshots.
         | 
         | BTW on ZFS i use ZnapZend. It's second best thing after
         | continuous snapshots:
         | 
         | https://www.znapzend.org/ https://github.com/oetiker/znapzend/
         | 
         | There are also some ZFS snapshotting daemons in Debian, but
         | this is much more elegant and flexible.
         | 
         | But since znapzend is userspace daemon (as are all ZFS
         | snapshoters) you need some kind of monitoring and warning
         | mechanism for cases something goes wrong and it can't longer
         | create snapshots (crashes, gets killed by OOM or something...).
         | In NILFS2 every write/delete is snapshot, so you are basicaly
         | guaranteed by kernel to have everything snapshoted without
         | having to watch it.
        
         | yonrg wrote:
         | I run this setup. zfs + zfsnap (not cron anymore, now
         | systemd.timer).
         | 
         | I cannot tell if NILFS is doing this too, with zfsnap I
         | maintain different retention times. 5-minutely for 1hour,
         | hourly for 1day, daily for a week. That are less than 60
         | snapshots. The older ones are cleaned up.
         | 
         | In addition, zfs brings compression and encryption. That's why
         | I have it on the laptops, too.
        
         | goodpoint wrote:
         | There is no comparison. NILFS provides *continuous* snaphots,
         | so you can inspect and rollback changes as needed.
         | 
         | It does without a performance penalty compared to other logging
         | filesystems.
         | 
         | And without using additional space forever. The backlog rotates
         | forward continuously.
         | 
         | It's a really unique feature that makes a lot of sense for
         | desktop use, where you might want to recover files that were
         | created and deleted after a short time.
        
           | harvie wrote:
           | Perhaps we can leverage "inotify" API to make ZFS snapshot
           | everytime some file had been changed... But i think ZFS is
           | not really good at handling huge amounts of snapshots. The
           | NILFS2 snapshots are probably more lightweight when compared
           | to ZFS ones.
        
             | goodpoint wrote:
             | The NILFS snapshots are practically free (for a logging
             | filesystem, obviously).
        
             | mustache_kimono wrote:
             | > Perhaps we can leverage "inotify" API to make ZFS
             | snapshot everytime some file had been changed...
             | 
             | ZFS and btrfs users are already living in the future:
             | inotifywait -r -m --format %w%f -e close_write
             | "/srv/downloads/" | while read -r line; do           #
             | command below will snapshot the dataset           # upon
             | which the closed file is located           sudo httm --snap
             | "$line"        done
             | 
             | See: https://kimono-koans.github.io/inotifywait/
        
               | [deleted]
        
           | fuckstick wrote:
           | > It does without a performance penalty.
           | 
           | What is the basis for comparison? Sounds like a pretty
           | meaningless statement at its face.
        
             | goodpoint wrote:
             | Compared to other logging filesystems obviously.
        
               | fuckstick wrote:
               | Nilfs baseline (write throughput especially) is slow as
               | shit compared to other filesystems including f2fs. So
               | just because you have this feature that doesn't make it
               | even slower isn't that interesting - you pay for it one
               | way or the other.
        
               | usr1106 wrote:
               | For many users filesystem speed of your home directory is
               | completely irrelevant unless you run on a Raspberry Pi
               | using SD cards. You just don't notice it.
               | 
               | Of course if you haver server handling let's say video
               | files things will be very different. And there are some
               | users who process huge amounts of data.
               | 
               | I run 2 lvm snapshots (daily and weekly) on my home
               | partition for years. Write performance is abysmal if you
               | measure it, but you don't note it in daily development
               | work.
        
               | [deleted]
        
           | [deleted]
        
           | 1MachineElf wrote:
           | >It's a really unique feature that makes a lot of sense for
           | desktop us
           | 
           | Sounds like it could serve as a basis for a Linux
           | implementation of something like Apple Time Machine.
        
             | [deleted]
        
             | mustache_kimono wrote:
             | With 'httm', a few of us are already living in that bright
             | future: https://github.com/kimono-koans/httm
        
             | masklinn wrote:
             | Afaik Time Machine does not do continuous snapshots, just
             | periodic (and triggered).
             | 
             | So you can already do that with zfs: take a snapshot and
             | send it to the backup drive.
        
           | harvie wrote:
           | "It does without a performance penalty"
           | 
           | yeah. it's already so terribly slow that it's unlikely that
           | taking snapshots can make it any slower :-D
        
             | Volundr wrote:
             | That was not my experience with NILFS. It outperformed ext4
             | on my laptop NVME.
        
               | akvadrako wrote:
               | The benchmarks here look pretty bad:
               | 
               | https://www.phoronix.com/review/linux-58-filesystems/4
        
               | Volundr wrote:
               | The last page looks pretty bad. If you look at the others
               | it's more of a mixed bag, but yeah.
               | 
               | I don't remember what benchmark I ran before deciding to
               | run it on my laptop. Given my work at the time probably
               | pgbench, but I couldn't say for sure. It was long enough
               | ago I also might've been benchmarking against ext3, not
               | 4.
        
               | harvie wrote:
               | i think i was running it on 6TB conventional HDD RAID1.
               | also note that the read and write speeds might be quite
               | asymetrical... in general also depends on workload type.
        
           | pkulak wrote:
           | > There is no comparison.
           | 
           | What if I compare it to BTRFS + Snapper? No performance
           | penalty there, plus checksumming.
        
             | AshamedCaptain wrote:
             | btrfs and snapperd do have a performance penalty as the
             | number of snapshots increases. Having 100+ usually means
             | snapper list will take north of an hour. You can easily
             | reach these numbers if you are taking a snapshot every
             | handful of minutes.
             | 
             | Even background snapper cleanups will start to take a toll,
             | since even if they are done with ionice they tend to block
             | simultaneous accesses to the filesystem while they are in
             | progress. If you have your root on the same filesystem,
             | it's not pretty -- lots of periodic system-wide freezes
             | with the HDD LEDs non-stop blinking. I tend to limit
             | snapshots always to < 20 for that reason (and so does the
             | default snapperd config).
        
             | mike256 wrote:
             | About 2 years ago I believed the same. Then I used BTRFS as
             | a store for VM images (with periodoc snapshot) and
             | performance went down to really really bad. After I deleted
             | all snapshots performance was good again. There is a big
             | performance penalty in btrfs with more than about 100
             | snapshots.
        
       | Volundr wrote:
       | NILFS is really, really cool. In concept. Unfortunately the
       | tooling and support just isn't there. I ran it for quite some
       | time on my laptop and the continuous snapshoting is everything I
       | hoped it'd be. At one point however there was a change to the
       | kernel that rendered it unbootable. Despite being a known and
       | recorded bug it took forever to get fixed (about a year if I
       | recall correctly) leaving me stuck on an old kernel the whole
       | time.
       | 
       | This was made more frustrating by the lack of any tooling such as
       | fsck to help me diagnose the issue. The only reason I figured out
       | it was a bug was that I booted a live CD to try to rescue the
       | system and it booted fine.
       | 
       | When I finally replaced that laptop I went back to ZFS and
       | scripted snapshots. As much as I want to, I just can't recommend
       | NILFS for daily use.
        
         | yonrg wrote:
         | Do you happen to remember which change in kernel was the cause?
         | 
         | I had troubles with un-popular file systems as root file system
         | when the initrd was not built properly. So sysresccd is always
         | good to have in reach.. Saying this, I think I won't have any
         | other file system on root besides the default of the distro.
         | Data which require special care are on other partitions.
        
         | CGamesPlay wrote:
         | How did Linus not go on a rampage after breaking userspace for
         | an entire year? Is NILFS not part of the kernel mainline, I
         | guess?
        
           | jraph wrote:
           | If I understand correctly, I don't think this is a userspace-
           | breaking bug, as in: a kernel API changed and made a
           | userspace program not work anymore.
           | 
           | It is a bug that prevents the kernel from booting. That's
           | bad, but that's not the same thing. That's not a userspace
           | compatibility issue such as the ones Linus chases. The user
           | space isn't even involved if the kernel cannot boot. Or if it
           | is actually a userspace program that causes a kernel crash,
           | it is a crash, which is not really the same thing as an API
           | change (one could argue, but that's a bit far-fetched, the
           | intents are not the same, etc - I don't see Linus explode on
           | somebody who introduced a crash the way he would explode on
           | someone changing a userspace API).
        
           | yjftsjthsd-h wrote:
           | > Is NILFS not part of the kernel mainline, I guess?
           | 
           | Good guess, but no:
           | 
           | https://github.com/torvalds/linux/tree/master/fs/nilfs2
           | 
           | > How did Linus not go on a rampage after breaking userspace
           | for an entire year?
           | 
           | I would very much like to know that as well. Any chance it
           | didn't get reported (at least, not as "this broke booting")?
        
             | Volundr wrote:
             | I reported it along with a few other users in
             | https://marc.info/?l=linux-nilfs&m=157540765215806&w=2. I
             | think it just isn't widely enough used that Linus noticed
             | we were broken. If I recall correctly it also wasn't
             | directly fixed so much as incidentally. I just kept
             | checking new kernel versions as they were released until
             | one worked. There was never anything in the change-log
             | (that I recall) about fixing the bug, just another change
             | that happened to fix the issue.
             | 
             | Edit: Looking through the archives, it looks like my memory
             | was somewhat uncharitable. It was reported in November and
             | directly patched in June (https://marc.info/?l=linux-
             | nilfs&m=159154670627428&w=2) so about 7 months after
             | reporting. Not sure what kernel release that would've
             | landed in, so could've been closer to 8.
        
           | bityard wrote:
           | > How did Linus not go on a rampage after breaking userspace
           | for an entire year?
           | 
           | Linus' commandment about not breaking userspace is frequently
           | misunderstood. He wants to ensure that user-space /programs/
           | do not break (even if they rely on buggy behavior that made
           | it into a release), not that the /user/ will never see any
           | breakage of the system whatsoever, which is of course an
           | impossible goal. Device drivers and filesystems are firmly
           | system-level stuff, bugs and backwards-incompatible changes
           | in those areas are regrettable but happen all the same.
        
       | cmurf wrote:
       | Very nice introduction to NILSFS, which has been in the Linux
       | kernel since 2009.
        
       | newcup wrote:
       | I think NILFS is a hidden gem. I've been using it exclusively in
       | my Linux laptops, desktops etc. since ca. 2014. Apart from one
       | kernel regression bug related to NILFS2 it's worked flawlessly
       | (no data corruption even with the bug just no access to the file
       | system; effectively it forced running older kernel while the bug
       | was fixed).
       | 
       | The continuous snapshotting has saved me a couple of times; I've
       | just mounted a version of the file system from few hours or weeks
       | ago to access overwritten or deleted data. I use NILFS also on
       | backup disks to provide combined deduplication and snapshots
       | easily (just rsync & NILFS' mkss, latter to make sure the
       | "checkpoints" aren't unnoticedly garbage collected in case the
       | backup disk gets full).
        
         | nix23 wrote:
         | >I think NILFS is a hidden gem. I've been using it exclusively
         | in my Linux laptops, desktops etc. since ca. 2014
         | 
         | Yes it's really sad, there we have a native and stable check-
         | summing fs, and nearly no one knows about it.
        
           | yjftsjthsd-h wrote:
           | > check-summing fs
           | 
           | Is it? Last I'd heard was
           | 
           | > nilfs2 store checksums for all data. However, at least the
           | current implementation does not verify it when reading.
           | 
           | https://www.spinics.net/lists/linux-nilfs/msg01063.html
        
             | nix23 wrote:
             | Hmm you could be right, i found nothing about that it is
             | calculated at read-time. Just with fsck.
        
           | conradev wrote:
           | BTRFS is also a native copy on write filesystem that verifies
           | a configurable checksum and supports snapshots.
           | 
           | The snapshots are not automatic, but short of that it is
           | pretty feature complete
        
             | nix23 wrote:
             | That's why i specifically wrote -> stable...
        
               | 77pt77 wrote:
               | BTRFS is not stable?
        
             | guipsp wrote:
             | BTRFS is pretty stable nowadays.
        
               | nerpderp82 wrote:
               | What does that mean quantifiably?
        
               | guipsp wrote:
               | Synology deploys it in their products
        
         | ComputerGuru wrote:
         | > Apart from one kernel regression bug related to NILFS2 it's
         | worked flawlessly
         | 
         | Maybe on x86? I've tried repeatedly to use it on ARM for
         | RaspberryPi where it would have been perfect, but always ran
         | into various kernel panics as soon as the file system is
         | mounted or accessed.
        
           | heavyset_go wrote:
           | I've used NILFS2 on flash storage on some old non-RPi ARMv7
           | hardware for a while without a problem. Switched to F2FS for
           | performance reasons, though.
        
           | newcup wrote:
           | True, I only have used it on x86 devices. Thanks for the
           | heads up!
           | 
           | I've heard so many stories of SD card failures (against which
           | snapshotting might be of no help) with RaspberryPi that I've
           | decided to send any valuable data promptly to safety over a
           | network. (Though, I personally haven't had any problems with
           | failing SD's.)
        
         | rodgerd wrote:
         | NILFS is absolutely wonderful; it was very unfortunate that
         | Linus chose to dub btrfs as the ext4 successor all those years
         | ago, because it cut off a lot of interest in the plethora of
         | interesting work that was going on at the time.
         | 
         | A decade later and btrfs is still riddled with problems and
         | incomplete, people are still using xfs and ext4 for lack of
         | trust, one kernel dev has a side hobby trying to block openzfs,
         | and excellent little projects like nilfs are largely unknown.
        
           | perrygeo wrote:
           | > one kernel dev has a side hobby trying to block openzfs
           | 
           | Can you elaborate?
        
       | nintendo1889 wrote:
       | I remember DEC/HP releasing the source to the digital unix AdvFS
       | filesystem on sourceforge with the intent of porting it over to
       | linux, but it never materialized. AdvFS had many advanced
       | features. The source is still available and within it are some
       | PDF slides that explain a lot of it's features.
        
       | Nifty3929 wrote:
       | Do any file systems have good, native support for tagging and
       | complex searched based on those tags?
        
         | DannyBee wrote:
         | BeFS was the last real one i'm aware of at the complexity you
         | are talking about (plenty of FSen have some very basic indexed
         | support for say file sizes , but not the kind of generic
         | tagging you are talking about)
         | 
         | At this point, the view seems to be "attributes happen in the
         | file system, indexing happens in user space".
         | 
         | Especially on linux.
         | 
         | Part of the reason is, as i understand it, the
         | surface/complexity of including query languages in the kernel,
         | which is not horribly unreasonable
         | 
         | So all the common FSen have reasonable xattr support, and
         | inotify/etc that support notification of attribute changes.
         | 
         | The expectation seems to be that the fact that inotify might
         | drop events now and then is not a dealbreaker. The modern queue
         | length is usually 16384 anyway.
         | 
         | I'm not saying there aren't tradeoffs here, but this seems to
         | be the direction taken overall.
         | 
         | I actually would love to have an FS with native indexed xattr
         | and a way to get at them.
         | 
         | I just don't think we'll get back there again anytime soon.
        
           | Nifty3929 wrote:
           | Okay - how about tagging and non-complex searches then.
           | Beggars can't be choosers :-)
           | 
           | Really what I'd like is just to search for some specific
           | tags, or maybe list a directory excluding some tag, or
           | similar. For bonus points, maybe a virtual directory that
           | represents a search like this, and which "contains" the
           | results of that search. (A "Search Folder")
           | 
           | I'll check out BeFS. Thanks!
        
       | harvie wrote:
       | I had issues with file locking when running some legacy database
       | software on NILFS2. Probably caused data corruption in that
       | database (not the FS itself).
       | 
       | SF website of NILFS2 suggests that there are some unimplemented
       | features, one of them being synchronous IO, which might have
       | caused that issue?
       | 
       | https://nilfs.sourceforge.io/en/current_status.html
       | 
       | In some cases, the NILFS2 is safer storage for your data than
       | ZFS. So NILFS might work for some simple usecases (eg. localy
       | storing documents that you modify often), but it's certainly not
       | ready to be deployed as generic filesystem. It's relatively slow
       | and sometimes behaves bit weird. If something goes really bad,
       | the recovery might be bit painfull. There is no fsck yet, nor
       | community support. NILFS2 can self-heal itself to some extent.
       | 
       | I really like the idea of NILFS2 but at this point i would prefer
       | patch adding continuous snapshotting to ZFS. Unlike NILFS2 the
       | ZFS have lots of active developers and big community. While
       | NILFS2 is almost dead. The fact it's been in kernel for quite
       | some time and most people didn't even noticed it (despite it's
       | very interresting features) speaks for itself.
       | 
       | Don't get me wrong. I wish that more developers get interested in
       | NILFS2 and fix these issues and make it on par with EXT4, XFS and
       | ZFS... But still ZFS has more features overall, so we might just
       | add continuous snapshots in memoriam of NILFS2.
        
         | yjftsjthsd-h wrote:
         | > In some cases, the NILFS2 is safer storage for your data than
         | ZFS.
         | 
         | What cases? Do you just mean due to continuous snapshots
         | protecting against accidental deletes or such, or are there
         | more "under the covers" things it fixes?
        
           | ComputerGuru wrote:
           | It's basically append-only for recent things so you
           | theoretically you can't lose anything (within a reasonable
           | timeframe). I don't know if the porcelain exposes everything
           | you need to avail yourself of that design functionality,
           | though.
        
       | compsciphd wrote:
       | we used NILFS 15 years ago in dejaview -
       | https://www.cs.columbia.edu/~nieh/pubs/sosp2007_dejaview.pdf
       | 
       | We combined nilfs + our process snapshotting tech (we tried to
       | mainline it, but it didn't go, but many of the concepts ended up
       | in CRIU though) + our remote display + screen reading tech (i.e.
       | normal APIs) to create an environment that could record
       | everything you ever saw visually and textually. enable you to
       | search it and enable you to recreate the state as it was at that
       | time with non noticeable interruption to the user (processes
       | downtime was like 0.02s).
        
         | heavyset_go wrote:
         | This is cool, thanks for sharing it.
        
       ___________________________________________________________________
       (page generated 2022-10-11 23:00 UTC)