[HN Gopher] ZFSBootMenu - A boot loader to manage ZFS boot envir...
       ___________________________________________________________________
        
       ZFSBootMenu - A boot loader to manage ZFS boot environments for
       Linux
        
       Author : nixcraft
       Score  : 136 points
       Date   : 2022-11-12 09:50 UTC (13 hours ago)
        
 (HTM) web link (zfsbootmenu.org)
 (TXT) w3m dump (zfsbootmenu.org)
        
       | nailer wrote:
       | I mean you can't maintain ZFS normally, and people have been
       | trying to make zfs happen for what... two decades now?
        
         | magicalhippo wrote:
         | > people have been trying to make zfs happen for what... two
         | decades now?
         | 
         | When players like AWS provides ZFS as one of four alternatives
         | in their "filesystem as a service"[1], I'd say we're beyond
         | "trying to make zfs happen".
         | 
         | Not to mention the PB worth of data others[2] rely on ZFS to
         | keep safe.
         | 
         | [1]: https://aws.amazon.com/fsx/
         | 
         | [2]: https://openzfs.org/wiki/Companies
        
         | phaer wrote:
         | What do you mean "have been trying to make zfs happen"? ZFS is
         | used in production in many places.
        
         | knaekhoved wrote:
         | When you say "normally", do you mean "badly, with fsck"?
         | 
         | ZFS is growing incredibly quickly in popularity, and the only
         | reason it's not the dominant filesystem already is because A)
         | it took linux a long time to add support, and only dedicated
         | appliance vendors had the will and ability to move to freebsd
         | and B) macos was going to switch to zfs in the late '00s, but
         | they got scared off by oracle's legal shenanigans, which seems
         | to no longer be a relevant factor.
        
           | WastingMyTime89 wrote:
           | > A) it took linux a long time to add support
           | 
           | Linux has no support for ZFS. This is an out-of-tree patch
           | set and therefor a no-go for most including myself.
           | 
           | ZFS intentionally has a terrible license and is owned by
           | Oracle. People are free to do what they want but I wish all
           | the time wasted on it could have been put in something more
           | interesting.
        
             | erk__ wrote:
             | As with most thinks in the linux world it depends on who
             | builds your upstream. ZFS is in the kernel distributed by
             | Ubuntu which is one of the largest distributions.
             | 
             | Linux is unlike Eg freebsd not a monolithic operating
             | system, but only a kernel so it is in my opinion not really
             | right to say that it has no support.
        
               | PlutoIsAPlanet wrote:
               | > but only a kernel so it is in my opinion not really
               | right to say that it has no support.
               | 
               | The kernel has no native support for ZFS.
               | 
               | Ubuntu may ship with ZFS, but that's one distro.
               | Meanwhile, RHEL etc won't even touch it.
               | 
               | On Linux, XFS dominates and likely will continue to
               | dominate the server world, meanwhile btrfs will slowly
               | erase ext4 in the desktop side of things.
               | Android/Embedded have always used their own different
               | filesystems so it's irrelevant there.
        
               | throw0101c wrote:
               | > _The kernel has no native support for ZFS._
               | 
               | Neither does it have accelerated Nvidia card support that
               | could be used for things like HPC/AI/ML. Yet I'm
               | administrating an entire cluster of Ubuntu machines with
               | cards just fine.
               | 
               | We generally use the "nvidia-driver-NNN-server" package.
               | 
               | If you want to live ideologically pure no one is going to
               | stop you, but someone of us need to get work done.
        
             | ghaff wrote:
             | CDDL isn't an especially terrible license in isolation
             | (it's basically Mozilla) but it is generally considered
             | incompatible with GPL which, depending upon which set of 20
             | year old memories from ex-Sun employees you're inclined to
             | believe, was more or less a deliberately nefarious state of
             | affairs.
             | 
             | Oracle owns most/all of the copyrights and Canonical was
             | willing to take a calculated risk after, presumably, some
             | back-channel discussions. But those companies with
             | something to actually lose from a lawsuit with Oracle or
             | organizations with strong free software principles aren't
             | going anywhere close. Oracle has had a long time to change
             | the license if they actually cared to.
             | 
             | Personally, I find it unfortunate that all the effort that
             | has gone into ZFS as essentially a hobbyist copy-on-write
             | filesystem didn't go into btrfs instead.
        
               | kobalsky wrote:
               | > I find it unfortunate that all the effort that has gone
               | into ZFS as essentially a hobbyist copy-on-write
               | filesystem didn't go into btrfs instead.
               | 
               | don't some BSDs and Linux share the same code base for
               | ZFS?
               | 
               | Last I heard FreeBSD switched to ZfsOnLinux as upstream a
               | few years ago before it was merged with OpenZFS.
               | 
               | IMO calling it a hobbyist fs is a bit unfair.
        
               | yakak wrote:
               | Being called hobbyist software derisively by the Linux
               | community is like being knighted, I assume.
        
               | cmeacham98 wrote:
               | While the CDDL isn't a terrible license in a vacuum, it
               | is (according to its author) _intentionally_ incompatible
               | with the GPL (https://en.wikipedia.org/wiki/Common_Develo
               | pment_and_Distrib...).
               | 
               | This is, in my opinion, the most important part. It's not
               | some unhappy accident that there are significant legal
               | issues with ZFS and GPL-licenced Linux - that is
               | (allegedly) by design.
        
               | ghaff wrote:
               | As that section says, there is (at least for public
               | consumption) disagreement among then-Sun employees as to
               | what the intent and beliefs were at the time.
               | 
               | I know all those folks to greater or lesser degrees and
               | Sun was a client of mine as an analyst. There were
               | certainly a lot of conflicting motivations and concerns
               | concerning Solaris and Linux.
        
             | throw0101c wrote:
             | > _ZFS intentionally has a terrible license_
             | 
             | The folks on FreeBSD didn't / don't seem to think so.
             | Neither does Apple (who pulled in DTrace, which has the
             | exact same license).
             | 
             | > _and is owned by Oracle._
             | 
             | The OpenZFS folks would don't seem to think so.
        
               | boomboomsubban wrote:
               | >The OpenZFS folks would don't seem to think so.
               | 
               | The OpenZFS people are fully aware that Oracle owns ZFS,
               | that's why they forked the last free copy and made
               | OpenZFS. A small nitpick.
        
         | jacob019 wrote:
         | I guess the trolls are out today. Thought I was on Reddit for a
         | minute.
        
         | hnlmorg wrote:
         | I don't understand why you're being snarky. ZFS has been hugely
         | successful since it's release, and continues to be successful
         | even now. The reason why it's popular is precisely because it's
         | easy to maintain.
        
           | nailer wrote:
           | Not sure why you'd consider criticism of ZFS to he snark.
           | Running any filesystem outside the mainline kernel is a bunch
           | of extra effort and I'm sure as a ZFS user you'd know that.
           | I'm not sure how ZFS could be "hugely successful" after two
           | decades and still not in the Linux kernel.
        
             | kobalsky wrote:
             | > Running any filesystem outside the mainline kernel is a
             | bunch of extra effort and I'm sure as a ZFS user you'd know
             | that
             | 
             | zfs-dkms makes usage simple since openzfs backwards
             | compatible down to 3.xx kernels. no need to mix match zfs
             | with kernel versions anymore.
             | 
             | the only drawback is that you may not get to use the
             | lastest kernel until openzfs mantainers give it the thumbs
             | up (no 6.xx compatible release yet), but that's not "a
             | bunch of extra effort".
             | 
             | and that's only a problem if you want to be in the bleeding
             | edge. LTS kernel users wouldn't know about it.
             | 
             | > I'm not sure how ZFS could be "hugely successful" after
             | two decades and still not in the Linux kernel.
             | 
             | "There is no way I can merge any of the ZFS efforts until I
             | get an official letter from Oracle that is signed by their
             | main legal counsel or preferably by Larry Ellison himself
             | that says that yes, it's OK to do so and treat the end
             | result as GPL'd" -Linus Torvalds
             | 
             | sound like a legal issue more than technical.
        
             | hnlmorg wrote:
             | Linux users are constantly using drivers outside of the
             | mainline kernel. Whether it's graphics cards, radio drivers
             | (things have gotten better in that regard but Bluetooth is
             | support is still terrible) or FUSE file systems.
             | 
             | The difference with ZFS is that the code is kernel-ready
             | but there's just some licensing worries (understandable
             | ones) that stop it from being mainlined.
             | 
             | I've been running ZFS on Ubuntu Server for several years
             | now and frankly ZFS is the only part of that entire system
             | that doesn't suck (in my opinion). I'd switch back to
             | FreeBSD in a heartbeat if I didn't need Docker support but
             | credit where credit is due, Ubuntu's ZFS support has been
             | really good.
             | 
             | Edit: just to add, I've got nothing against anyone who does
             | enjoy Ubuntu Server. It's just not a Linux distro I
             | personally have much fondness for.
        
       | vermaden wrote:
       | I still wait for ANY Linux distro that would have installer that
       | would allow you to install Linux with Root on ZFS and with
       | ZFSBootMenu (or any other ZFS Boot Environments tool) ...
        
         | szanni wrote:
         | I would love to see that too, but believe this is rather
         | unlikely for any major Linux distro. Why? Because there is no
         | guarantee the required kernel symbols will stay available.
         | 
         | This has for example happened with the linux-rt branch that
         | decided to change the license of some of the exported kernel
         | symbols to GPL, which prevents the ZFS module from compiling.
         | 
         | As far as I can tell, the kernel developers make sure to not
         | break user space but no such guarantees are given for the
         | kernel modules. Having the driver for your root file system
         | possibly not compile on the next kernel update seems like a
         | nightmare to support for any distribution.
        
           | yjftsjthsd-h wrote:
           | > I would love to see that too, but believe this is rather
           | unlikely for any major Linux distro. Why? Because there is no
           | guarantee the required kernel symbols will stay available.
           | 
           | Er, Ubuntu already supports ZFS root out of the box in the
           | default installer; why would ZBM be any harder to support
           | than that?
        
         | ghoward wrote:
         | I'm in the process of building a NixOS-like distro. I'm 90%
         | certain mine will use these.
        
       | ninefathom wrote:
       | I'm glad to see interest in this functionality taking off in
       | Linux-land. I think there are one or two other projects with
       | similar goals (i.e. implementing BE selection on Linux) and it
       | might be time for me to do a side-by-side.
       | 
       | This capability was something of which the lack on Linux has long
       | puzzled me. Solaris actually implemented a very early incarnation
       | of this ability (called "live upgrades" at the time from its
       | original use case) back in the early '00s- in Solaris 8, and on
       | top of UFS no less, if I recall correctly. It evolved over the
       | next decade first adding ZFS into the mix, then finally morphing
       | from the early "live upgrade" stuff into the full "boot
       | environment" concept around 2010 with Solaris 11. FreeBSD
       | implemented it around 2012, in the early days of their ZFS work.
       | More than a decade ago. That puts Linux at least ten years behind
       | the curve here, and arguably closer to twenty.
       | 
       | I'm a fan of using the right tool for the right job, and jumping
       | freely between Solaris (or OpenIndiana nowadays), Linux, and
       | FreeBSD for any given deployment is par for the course. Until
       | now, all other things being equal, FreeBSD or Solaris would often
       | win out if minimizing downtime* was a much higher priority than
       | ease of replacing admins. Assuming that BE support in Linux
       | matures quickly, that calculus has now swung strongly in Linux's
       | favor.
       | 
       | *Re: minimizing downtime, if somebody is puzzled as to what I
       | mean, think of the last time that you had a Linux installation
       | fail to come back up to full operation after a borked round of
       | package upgrades. It's not often, but it does happen
       | occasionally. Now imagine that the time you spent getting back up
       | and working, whatever it might have been, was reliably less than
       | sixty seconds. Now imagine it's 2am, you're not even fully awake
       | following a panicked phone call from the operations night shift,
       | and your job hangs in the balance. Makes quite a difference.
        
         | nortonham wrote:
         | what do you use OpenIndiana for?
         | 
         | >This capability was something of which the lack on Linux has
         | long puzzled me.
         | 
         | Agreed. I recently tried out OI "hipster" and the way boot
         | environments are integrated into caja (the file manager) with
         | Time Slider was so smooth it got me thinking why something like
         | it wasn't more popular in linux.
        
           | ninefathom wrote:
           | > what do you use OpenIndiana for?
           | 
           | I find that it's a good fit for quite a few things, but if
           | you're looking for a specific example: clustered Java
           | application stacks, like ELK or Hadoop.
           | 
           | Zones, crossbow networking, SMF, and ZFS w/ BEs all working
           | seamlessly together is a fantastic combination for easy-
           | button admin of low- or zero-downtime clustered applications.
        
         | ploxiln wrote:
         | (without having read the article) after-update filesystem
         | rollback sounds like what Suse has offered for about 10 years:
         | https://www.suse.com/c/introduction-system-rollbacks-btrfs-a...
         | 
         | There just hasn't been much demand for it. There are a bunch of
         | other mechanisms used instead, like redundant systems and
         | gradual rollouts, working with full system images (or container
         | images) instead, etc.
         | 
         | For personal-ish systems, things are reliable enough, and if
         | there is a problem you can't just stop updating, you'll need to
         | fix it soon anyway. I've been updating a debian install on my
         | home fileserver for 3 major debian releases, 5+ years ...
        
       | esjeon wrote:
       | I tried this alongside Void Linux, but I found I don't really
       | need it.
       | 
       | TBH, this is really cool. I liked that I'm able to choose
       | snapshots for booting - a very good recovery option. The
       | interface is well polished, and comes with fzf for quick
       | searching. It's a true dream for distro hoppers, since ZFS works
       | like thin-provisioned partitions (though distro options are
       | limited due to ZFS). Pretty cool in and out.
       | 
       | But it turned out to be a super-overkill for me. Firstly, I
       | stopped dual-booting like a decade ago. I run everything else in
       | VMs. Secondly, the host system these days hardly breaks. Lots of
       | things work out-of-box unlike old days, and service settings can
       | be isolated in containers. Thirdly, my host environment can be
       | recreated within an hour including the download time and few
       | trims, as long as `/home` is backed up. So I don't worry much
       | about the root partition.
       | 
       | I wonder how this is working for others.
        
         | E39M5S62 wrote:
         | I use ZFSBootMenu to boot a single distribution on each of my
         | systems. While it certainly can help booting multiple different
         | environments, the real value-add to me is that my entire OS is
         | contained on a single filesystem. There's no longer a need to
         | make an entirely separate boot pool to work around GRUBs
         | extremely limited ZFS support.
         | 
         | Because ZBM (can) use the kernel and ZFS userland+modules on
         | your own system, it's never really behind what your OS is
         | running. Additionally, since we import the pool read-only by
         | default, new breaking features/pool flags in ZFS typically
         | aren't a problem. It's only when you try to import a pool read-
         | write that ZFS will have issues, so we detect that, warn you
         | and then prevent it from happening.
         | 
         | Since it also ships as an EFI executable that can import/boot
         | any pool, it's really easy to make recovery media. Just throw
         | the EFI on a USB drive with an ESP and name it BOOTX64.EFI and
         | most modern firmware will use it in the absence of any other
         | working boot entries.
        
       | seized wrote:
       | Boot environments are one of those magic features that when
       | you've used it, it's hard to give up.
       | 
       | My NAS has long been on OpenIndiana. Boot environments mean zero
       | risk OS upgrades. At one point I could have gone back to a 4 year
       | old OS version and booted it with no data loss.
       | 
       | You can create one at any time, so it brings an even better take
       | on VM snapshots to the physical machine world. Hacking on
       | something and want a fallback? "Beadm create beforehacking" and
       | you're safe.
        
         | nortonham wrote:
         | how do you like using OI day to day on your NAS? Any pitfalls
         | or things to be aware of?
        
         | willis936 wrote:
         | I've never used a boot environment. Is there a way to use a
         | boot environment to have a ZFS-backed Windows install?
        
           | infogulch wrote:
           | You may be interested in recent (~the last month)
           | developments in adding Windows support to OpenZFS:
           | https://github.com/openzfs/zfs/pull/14034
        
       | [deleted]
        
       | Teknoman117 wrote:
       | I've been kinda doing a similar thing with my Gentoo installation
       | on btrfs.
       | 
       | The btrfs subvolumes are structured like this:
       | 
       | <root subvolume>/$(hostname)/${environment}/@volume (e.g. @root,
       | @home)
       | 
       | snapshots look like this:
       | 
       | <root subvolume>/$(hostname)/${environment}/volume_$(date -u
       | +%Y-%m-%d_%H-%M-00)
       | 
       | I have a few scripts "make-snapshots", "backup-snapshots",
       | "update-shell", and "update-commit". make-snapshots creates
       | readonly snapshots of my system, backup-snapshots does
       | incremental backups of those to my NAS, update-shell creates a
       | writable snapshot of @root as @root-update and drops you into a
       | chroot environment. You can then run all the portage commands you
       | want without fear of borking your current environment. Upon exit
       | it checks whatever the /usr/src/linux symlink points to, copies
       | the associated vmlinuz and initramfs images to the EFI partition,
       | and creates/updates a boot entry in rEFInd. You can then boot
       | either into your previous version or the update version. Once
       | you're satisfied that your new environment works, you run
       | "update-commit" which deletes the @root subvolume and replaces it
       | with your current @root-update subvolume.
       | 
       | A change I've been considering is to drop the concept of having a
       | @root subvolume at all. Current implementation requires two
       | reboots: one to get from @root to @root-update, where (if it's
       | good) you delete @root and make a writable snapshot of @root-
       | update as @root. The second reboot is to get onto (the new)
       | @root. An alternative might be to include the date/version in the
       | name of the writable snapshots as well. "committing" the update
       | would just mean setting the current booted subvolume as the
       | "head". Future snapshots/updates will be made from that
       | subvolume. No need for a reboot because you're currently on it
       | with everything mounted correctly. Any writable subvolumes older
       | than "head" would be cleaned up upon booting "head".
       | 
       | Could even go a step further and add something to my initramfs
       | where if you try to boot a version where the writable subvolume
       | has been deleted, it would make a temporary writable subvolume
       | for it from the snapshot.
        
       | kkfx wrote:
       | So... After a decade GNU/Linux have something similar to BEAdm
       | integrated with the boot process...
       | 
       | When we talk about sorry state of REAL tech evolution this and
       | many others features should be counted...
        
       | dazzawazza wrote:
       | Always good to see Linux being inspired by FreeBSD.
        
       | 1letterunixname wrote:
        
         | hnlmorg wrote:
         | Choice is good if it offers something different, which this
         | does because it is more than just a boot menu with ZFS support.
         | 
         | Anyway, since when has Linux been adverse to choice? Multiple
         | different init daemons, window managers, desktop environments,
         | cron daemons, MTAs, scripting languages, shells, etc. even the
         | way you set up a networking interface can differ wildly.
        
         | yyyk wrote:
         | GRUB(any number) is horrible and should be entirely replaced.
        
         | fjdiccf wrote:
         | Pretty much every single person using Linux or writing open
         | source is doing it specifically because the choices they had
         | were not adequate.
         | 
         | Don't like choice? Go back to Mac, spare us your hot takes.
        
         | sirn wrote:
         | The problem is, ZFS support on GRUB2 hasn't been great, partly
         | due to CDDL/GPL licensing incompatibility requiring lots of ZFS
         | internals to be re-implemented in GRUB. This resulted in issues
         | such as grub-probe unable to detect ZFS pools due to
         | unsupported ZFS features[1] (including native ZFS encryption,
         | which is a deal-breaker for many)
         | 
         | ZBM took another approach. It provides a small initramfs image
         | that are built on the host machine via standard method such as
         | dracut or mkinitcpio. This image provides an interface for
         | decrypting/mounting ZFS filesystems using the very same ZFS
         | kernel module and tools installed on the host. After the
         | filesystem is mounted, it then kexec'd into the host kernel.
         | 
         | This also means ZBM doesn't completely replaced GRUB2 or
         | syslinux. Instead, it rely on those intermediate bootloader
         | (including EFI bootloader such as rEFId/gummiboot) to load ZBM
         | itself. (Though ZBM itself only has built-in hooks for syslinux
         | and gummiboot).
         | 
         | Being an initramfs give an extra benefit of providing
         | interesting mechanism during boot e.g. providing a SSH server
         | for entering an encryption key on a headless server[2], ability
         | to discover, managing/booting from ZFS snapshots, etc.
         | 
         | (No affiliate; just a very happy user.)
         | 
         | [1]: https://savannah.gnu.org/bugs/?58555
         | 
         | [2]: https://github.com/zbm-dev/zfsbootmenu/wiki/Remote-Access-
         | to...
        
           | matja wrote:
           | Wouldn't kexec with a mounted filesystem lose the filesystem
           | state because the kernel heap+stack is overwritten? I think
           | ZBM copies the kernel/initramfs from the ZFS dataset
           | (presumably to tmpfs), unmounts/exports, then kexec's, and
           | the new initramfs imports/mounts the pool/dataset as usual?
        
             | sirn wrote:
             | My understanding is that during the boot process using
             | ZBM's initramfs:
             | 
             | 1. ZBM prompt for encryption passphrase, decrypts the
             | filesystem, locate kernel/initramfs on ZFS datasets, then
             | display boot menu
             | 
             | 2. ZBM kexec into the kernel on the filesystem using the
             | chosen kernel/initramfs while appending root=zfs:... to the
             | kernel parameter
             | 
             | 3. The target kernel decrypts the filesystem[^] and mounts
             | the root ZFS again and boot into final system
             | 
             | [^]: In this case, ZBM requires the encryption key to be
             | placed in the target initramfs (not ZBM's) for the target
             | kernel to load (dataset need to be decrypted again since
             | kernel state is disregarded). This initramfs is located
             | inside the encrypted filesystem itself, only accessible
             | after initial decryption/mount by ZBM in step 1, so the
             | only way to obtain this key is to already have access to
             | encrypted filesystem in the first place.
        
               | E39M5S62 wrote:
               | That's exactly right. We also append spl.spl_hostid to
               | the command line, to work around any possible hostid
               | mismatches inside the boot environment.
        
               | sirn wrote:
               | Thank you for such a great tool. I've recently migrated
               | from one server to another server in different contenient
               | via `zfs send | zfs recv` (using hrmpf), and `generate-
               | zbm` inside the chroot was all I need to get it working
               | again.
        
             | E39M5S62 wrote:
             | Since the pool itself is imported read-only by default,
             | there's no state to keep. We don't even need to export the
             | pool, no txg's can be generated and left in a pending
             | state.
             | 
             | If a pool is switched to read-write so that the default
             | kernel can be set, a snapshot cloned to a new BE, etc, we
             | check for that and then export the pool just before kexec.
             | 
             | Once kexec is done, your BEs kernel and initramfs
             | essentially start fresh and actas if it's a fresh boot.
        
       ___________________________________________________________________
       (page generated 2022-11-12 23:00 UTC)