[HN Gopher] ZFSBootMenu - A boot loader to manage ZFS boot envir... ___________________________________________________________________ ZFSBootMenu - A boot loader to manage ZFS boot environments for Linux Author : nixcraft Score : 136 points Date : 2022-11-12 09:50 UTC (13 hours ago) (HTM) web link (zfsbootmenu.org) (TXT) w3m dump (zfsbootmenu.org) | nailer wrote: | I mean you can't maintain ZFS normally, and people have been | trying to make zfs happen for what... two decades now? | magicalhippo wrote: | > people have been trying to make zfs happen for what... two | decades now? | | When players like AWS provides ZFS as one of four alternatives | in their "filesystem as a service"[1], I'd say we're beyond | "trying to make zfs happen". | | Not to mention the PB worth of data others[2] rely on ZFS to | keep safe. | | [1]: https://aws.amazon.com/fsx/ | | [2]: https://openzfs.org/wiki/Companies | phaer wrote: | What do you mean "have been trying to make zfs happen"? ZFS is | used in production in many places. | knaekhoved wrote: | When you say "normally", do you mean "badly, with fsck"? | | ZFS is growing incredibly quickly in popularity, and the only | reason it's not the dominant filesystem already is because A) | it took linux a long time to add support, and only dedicated | appliance vendors had the will and ability to move to freebsd | and B) macos was going to switch to zfs in the late '00s, but | they got scared off by oracle's legal shenanigans, which seems | to no longer be a relevant factor. | WastingMyTime89 wrote: | > A) it took linux a long time to add support | | Linux has no support for ZFS. This is an out-of-tree patch | set and therefor a no-go for most including myself. | | ZFS intentionally has a terrible license and is owned by | Oracle. People are free to do what they want but I wish all | the time wasted on it could have been put in something more | interesting. | erk__ wrote: | As with most thinks in the linux world it depends on who | builds your upstream. ZFS is in the kernel distributed by | Ubuntu which is one of the largest distributions. | | Linux is unlike Eg freebsd not a monolithic operating | system, but only a kernel so it is in my opinion not really | right to say that it has no support. | PlutoIsAPlanet wrote: | > but only a kernel so it is in my opinion not really | right to say that it has no support. | | The kernel has no native support for ZFS. | | Ubuntu may ship with ZFS, but that's one distro. | Meanwhile, RHEL etc won't even touch it. | | On Linux, XFS dominates and likely will continue to | dominate the server world, meanwhile btrfs will slowly | erase ext4 in the desktop side of things. | Android/Embedded have always used their own different | filesystems so it's irrelevant there. | throw0101c wrote: | > _The kernel has no native support for ZFS._ | | Neither does it have accelerated Nvidia card support that | could be used for things like HPC/AI/ML. Yet I'm | administrating an entire cluster of Ubuntu machines with | cards just fine. | | We generally use the "nvidia-driver-NNN-server" package. | | If you want to live ideologically pure no one is going to | stop you, but someone of us need to get work done. | ghaff wrote: | CDDL isn't an especially terrible license in isolation | (it's basically Mozilla) but it is generally considered | incompatible with GPL which, depending upon which set of 20 | year old memories from ex-Sun employees you're inclined to | believe, was more or less a deliberately nefarious state of | affairs. | | Oracle owns most/all of the copyrights and Canonical was | willing to take a calculated risk after, presumably, some | back-channel discussions. But those companies with | something to actually lose from a lawsuit with Oracle or | organizations with strong free software principles aren't | going anywhere close. Oracle has had a long time to change | the license if they actually cared to. | | Personally, I find it unfortunate that all the effort that | has gone into ZFS as essentially a hobbyist copy-on-write | filesystem didn't go into btrfs instead. | kobalsky wrote: | > I find it unfortunate that all the effort that has gone | into ZFS as essentially a hobbyist copy-on-write | filesystem didn't go into btrfs instead. | | don't some BSDs and Linux share the same code base for | ZFS? | | Last I heard FreeBSD switched to ZfsOnLinux as upstream a | few years ago before it was merged with OpenZFS. | | IMO calling it a hobbyist fs is a bit unfair. | yakak wrote: | Being called hobbyist software derisively by the Linux | community is like being knighted, I assume. | cmeacham98 wrote: | While the CDDL isn't a terrible license in a vacuum, it | is (according to its author) _intentionally_ incompatible | with the GPL (https://en.wikipedia.org/wiki/Common_Develo | pment_and_Distrib...). | | This is, in my opinion, the most important part. It's not | some unhappy accident that there are significant legal | issues with ZFS and GPL-licenced Linux - that is | (allegedly) by design. | ghaff wrote: | As that section says, there is (at least for public | consumption) disagreement among then-Sun employees as to | what the intent and beliefs were at the time. | | I know all those folks to greater or lesser degrees and | Sun was a client of mine as an analyst. There were | certainly a lot of conflicting motivations and concerns | concerning Solaris and Linux. | throw0101c wrote: | > _ZFS intentionally has a terrible license_ | | The folks on FreeBSD didn't / don't seem to think so. | Neither does Apple (who pulled in DTrace, which has the | exact same license). | | > _and is owned by Oracle._ | | The OpenZFS folks would don't seem to think so. | boomboomsubban wrote: | >The OpenZFS folks would don't seem to think so. | | The OpenZFS people are fully aware that Oracle owns ZFS, | that's why they forked the last free copy and made | OpenZFS. A small nitpick. | jacob019 wrote: | I guess the trolls are out today. Thought I was on Reddit for a | minute. | hnlmorg wrote: | I don't understand why you're being snarky. ZFS has been hugely | successful since it's release, and continues to be successful | even now. The reason why it's popular is precisely because it's | easy to maintain. | nailer wrote: | Not sure why you'd consider criticism of ZFS to he snark. | Running any filesystem outside the mainline kernel is a bunch | of extra effort and I'm sure as a ZFS user you'd know that. | I'm not sure how ZFS could be "hugely successful" after two | decades and still not in the Linux kernel. | kobalsky wrote: | > Running any filesystem outside the mainline kernel is a | bunch of extra effort and I'm sure as a ZFS user you'd know | that | | zfs-dkms makes usage simple since openzfs backwards | compatible down to 3.xx kernels. no need to mix match zfs | with kernel versions anymore. | | the only drawback is that you may not get to use the | lastest kernel until openzfs mantainers give it the thumbs | up (no 6.xx compatible release yet), but that's not "a | bunch of extra effort". | | and that's only a problem if you want to be in the bleeding | edge. LTS kernel users wouldn't know about it. | | > I'm not sure how ZFS could be "hugely successful" after | two decades and still not in the Linux kernel. | | "There is no way I can merge any of the ZFS efforts until I | get an official letter from Oracle that is signed by their | main legal counsel or preferably by Larry Ellison himself | that says that yes, it's OK to do so and treat the end | result as GPL'd" -Linus Torvalds | | sound like a legal issue more than technical. | hnlmorg wrote: | Linux users are constantly using drivers outside of the | mainline kernel. Whether it's graphics cards, radio drivers | (things have gotten better in that regard but Bluetooth is | support is still terrible) or FUSE file systems. | | The difference with ZFS is that the code is kernel-ready | but there's just some licensing worries (understandable | ones) that stop it from being mainlined. | | I've been running ZFS on Ubuntu Server for several years | now and frankly ZFS is the only part of that entire system | that doesn't suck (in my opinion). I'd switch back to | FreeBSD in a heartbeat if I didn't need Docker support but | credit where credit is due, Ubuntu's ZFS support has been | really good. | | Edit: just to add, I've got nothing against anyone who does | enjoy Ubuntu Server. It's just not a Linux distro I | personally have much fondness for. | vermaden wrote: | I still wait for ANY Linux distro that would have installer that | would allow you to install Linux with Root on ZFS and with | ZFSBootMenu (or any other ZFS Boot Environments tool) ... | szanni wrote: | I would love to see that too, but believe this is rather | unlikely for any major Linux distro. Why? Because there is no | guarantee the required kernel symbols will stay available. | | This has for example happened with the linux-rt branch that | decided to change the license of some of the exported kernel | symbols to GPL, which prevents the ZFS module from compiling. | | As far as I can tell, the kernel developers make sure to not | break user space but no such guarantees are given for the | kernel modules. Having the driver for your root file system | possibly not compile on the next kernel update seems like a | nightmare to support for any distribution. | yjftsjthsd-h wrote: | > I would love to see that too, but believe this is rather | unlikely for any major Linux distro. Why? Because there is no | guarantee the required kernel symbols will stay available. | | Er, Ubuntu already supports ZFS root out of the box in the | default installer; why would ZBM be any harder to support | than that? | ghoward wrote: | I'm in the process of building a NixOS-like distro. I'm 90% | certain mine will use these. | ninefathom wrote: | I'm glad to see interest in this functionality taking off in | Linux-land. I think there are one or two other projects with | similar goals (i.e. implementing BE selection on Linux) and it | might be time for me to do a side-by-side. | | This capability was something of which the lack on Linux has long | puzzled me. Solaris actually implemented a very early incarnation | of this ability (called "live upgrades" at the time from its | original use case) back in the early '00s- in Solaris 8, and on | top of UFS no less, if I recall correctly. It evolved over the | next decade first adding ZFS into the mix, then finally morphing | from the early "live upgrade" stuff into the full "boot | environment" concept around 2010 with Solaris 11. FreeBSD | implemented it around 2012, in the early days of their ZFS work. | More than a decade ago. That puts Linux at least ten years behind | the curve here, and arguably closer to twenty. | | I'm a fan of using the right tool for the right job, and jumping | freely between Solaris (or OpenIndiana nowadays), Linux, and | FreeBSD for any given deployment is par for the course. Until | now, all other things being equal, FreeBSD or Solaris would often | win out if minimizing downtime* was a much higher priority than | ease of replacing admins. Assuming that BE support in Linux | matures quickly, that calculus has now swung strongly in Linux's | favor. | | *Re: minimizing downtime, if somebody is puzzled as to what I | mean, think of the last time that you had a Linux installation | fail to come back up to full operation after a borked round of | package upgrades. It's not often, but it does happen | occasionally. Now imagine that the time you spent getting back up | and working, whatever it might have been, was reliably less than | sixty seconds. Now imagine it's 2am, you're not even fully awake | following a panicked phone call from the operations night shift, | and your job hangs in the balance. Makes quite a difference. | nortonham wrote: | what do you use OpenIndiana for? | | >This capability was something of which the lack on Linux has | long puzzled me. | | Agreed. I recently tried out OI "hipster" and the way boot | environments are integrated into caja (the file manager) with | Time Slider was so smooth it got me thinking why something like | it wasn't more popular in linux. | ninefathom wrote: | > what do you use OpenIndiana for? | | I find that it's a good fit for quite a few things, but if | you're looking for a specific example: clustered Java | application stacks, like ELK or Hadoop. | | Zones, crossbow networking, SMF, and ZFS w/ BEs all working | seamlessly together is a fantastic combination for easy- | button admin of low- or zero-downtime clustered applications. | ploxiln wrote: | (without having read the article) after-update filesystem | rollback sounds like what Suse has offered for about 10 years: | https://www.suse.com/c/introduction-system-rollbacks-btrfs-a... | | There just hasn't been much demand for it. There are a bunch of | other mechanisms used instead, like redundant systems and | gradual rollouts, working with full system images (or container | images) instead, etc. | | For personal-ish systems, things are reliable enough, and if | there is a problem you can't just stop updating, you'll need to | fix it soon anyway. I've been updating a debian install on my | home fileserver for 3 major debian releases, 5+ years ... | esjeon wrote: | I tried this alongside Void Linux, but I found I don't really | need it. | | TBH, this is really cool. I liked that I'm able to choose | snapshots for booting - a very good recovery option. The | interface is well polished, and comes with fzf for quick | searching. It's a true dream for distro hoppers, since ZFS works | like thin-provisioned partitions (though distro options are | limited due to ZFS). Pretty cool in and out. | | But it turned out to be a super-overkill for me. Firstly, I | stopped dual-booting like a decade ago. I run everything else in | VMs. Secondly, the host system these days hardly breaks. Lots of | things work out-of-box unlike old days, and service settings can | be isolated in containers. Thirdly, my host environment can be | recreated within an hour including the download time and few | trims, as long as `/home` is backed up. So I don't worry much | about the root partition. | | I wonder how this is working for others. | E39M5S62 wrote: | I use ZFSBootMenu to boot a single distribution on each of my | systems. While it certainly can help booting multiple different | environments, the real value-add to me is that my entire OS is | contained on a single filesystem. There's no longer a need to | make an entirely separate boot pool to work around GRUBs | extremely limited ZFS support. | | Because ZBM (can) use the kernel and ZFS userland+modules on | your own system, it's never really behind what your OS is | running. Additionally, since we import the pool read-only by | default, new breaking features/pool flags in ZFS typically | aren't a problem. It's only when you try to import a pool read- | write that ZFS will have issues, so we detect that, warn you | and then prevent it from happening. | | Since it also ships as an EFI executable that can import/boot | any pool, it's really easy to make recovery media. Just throw | the EFI on a USB drive with an ESP and name it BOOTX64.EFI and | most modern firmware will use it in the absence of any other | working boot entries. | seized wrote: | Boot environments are one of those magic features that when | you've used it, it's hard to give up. | | My NAS has long been on OpenIndiana. Boot environments mean zero | risk OS upgrades. At one point I could have gone back to a 4 year | old OS version and booted it with no data loss. | | You can create one at any time, so it brings an even better take | on VM snapshots to the physical machine world. Hacking on | something and want a fallback? "Beadm create beforehacking" and | you're safe. | nortonham wrote: | how do you like using OI day to day on your NAS? Any pitfalls | or things to be aware of? | willis936 wrote: | I've never used a boot environment. Is there a way to use a | boot environment to have a ZFS-backed Windows install? | infogulch wrote: | You may be interested in recent (~the last month) | developments in adding Windows support to OpenZFS: | https://github.com/openzfs/zfs/pull/14034 | [deleted] | Teknoman117 wrote: | I've been kinda doing a similar thing with my Gentoo installation | on btrfs. | | The btrfs subvolumes are structured like this: | | <root subvolume>/$(hostname)/${environment}/@volume (e.g. @root, | @home) | | snapshots look like this: | | <root subvolume>/$(hostname)/${environment}/volume_$(date -u | +%Y-%m-%d_%H-%M-00) | | I have a few scripts "make-snapshots", "backup-snapshots", | "update-shell", and "update-commit". make-snapshots creates | readonly snapshots of my system, backup-snapshots does | incremental backups of those to my NAS, update-shell creates a | writable snapshot of @root as @root-update and drops you into a | chroot environment. You can then run all the portage commands you | want without fear of borking your current environment. Upon exit | it checks whatever the /usr/src/linux symlink points to, copies | the associated vmlinuz and initramfs images to the EFI partition, | and creates/updates a boot entry in rEFInd. You can then boot | either into your previous version or the update version. Once | you're satisfied that your new environment works, you run | "update-commit" which deletes the @root subvolume and replaces it | with your current @root-update subvolume. | | A change I've been considering is to drop the concept of having a | @root subvolume at all. Current implementation requires two | reboots: one to get from @root to @root-update, where (if it's | good) you delete @root and make a writable snapshot of @root- | update as @root. The second reboot is to get onto (the new) | @root. An alternative might be to include the date/version in the | name of the writable snapshots as well. "committing" the update | would just mean setting the current booted subvolume as the | "head". Future snapshots/updates will be made from that | subvolume. No need for a reboot because you're currently on it | with everything mounted correctly. Any writable subvolumes older | than "head" would be cleaned up upon booting "head". | | Could even go a step further and add something to my initramfs | where if you try to boot a version where the writable subvolume | has been deleted, it would make a temporary writable subvolume | for it from the snapshot. | kkfx wrote: | So... After a decade GNU/Linux have something similar to BEAdm | integrated with the boot process... | | When we talk about sorry state of REAL tech evolution this and | many others features should be counted... | dazzawazza wrote: | Always good to see Linux being inspired by FreeBSD. | 1letterunixname wrote: | hnlmorg wrote: | Choice is good if it offers something different, which this | does because it is more than just a boot menu with ZFS support. | | Anyway, since when has Linux been adverse to choice? Multiple | different init daemons, window managers, desktop environments, | cron daemons, MTAs, scripting languages, shells, etc. even the | way you set up a networking interface can differ wildly. | yyyk wrote: | GRUB(any number) is horrible and should be entirely replaced. | fjdiccf wrote: | Pretty much every single person using Linux or writing open | source is doing it specifically because the choices they had | were not adequate. | | Don't like choice? Go back to Mac, spare us your hot takes. | sirn wrote: | The problem is, ZFS support on GRUB2 hasn't been great, partly | due to CDDL/GPL licensing incompatibility requiring lots of ZFS | internals to be re-implemented in GRUB. This resulted in issues | such as grub-probe unable to detect ZFS pools due to | unsupported ZFS features[1] (including native ZFS encryption, | which is a deal-breaker for many) | | ZBM took another approach. It provides a small initramfs image | that are built on the host machine via standard method such as | dracut or mkinitcpio. This image provides an interface for | decrypting/mounting ZFS filesystems using the very same ZFS | kernel module and tools installed on the host. After the | filesystem is mounted, it then kexec'd into the host kernel. | | This also means ZBM doesn't completely replaced GRUB2 or | syslinux. Instead, it rely on those intermediate bootloader | (including EFI bootloader such as rEFId/gummiboot) to load ZBM | itself. (Though ZBM itself only has built-in hooks for syslinux | and gummiboot). | | Being an initramfs give an extra benefit of providing | interesting mechanism during boot e.g. providing a SSH server | for entering an encryption key on a headless server[2], ability | to discover, managing/booting from ZFS snapshots, etc. | | (No affiliate; just a very happy user.) | | [1]: https://savannah.gnu.org/bugs/?58555 | | [2]: https://github.com/zbm-dev/zfsbootmenu/wiki/Remote-Access- | to... | matja wrote: | Wouldn't kexec with a mounted filesystem lose the filesystem | state because the kernel heap+stack is overwritten? I think | ZBM copies the kernel/initramfs from the ZFS dataset | (presumably to tmpfs), unmounts/exports, then kexec's, and | the new initramfs imports/mounts the pool/dataset as usual? | sirn wrote: | My understanding is that during the boot process using | ZBM's initramfs: | | 1. ZBM prompt for encryption passphrase, decrypts the | filesystem, locate kernel/initramfs on ZFS datasets, then | display boot menu | | 2. ZBM kexec into the kernel on the filesystem using the | chosen kernel/initramfs while appending root=zfs:... to the | kernel parameter | | 3. The target kernel decrypts the filesystem[^] and mounts | the root ZFS again and boot into final system | | [^]: In this case, ZBM requires the encryption key to be | placed in the target initramfs (not ZBM's) for the target | kernel to load (dataset need to be decrypted again since | kernel state is disregarded). This initramfs is located | inside the encrypted filesystem itself, only accessible | after initial decryption/mount by ZBM in step 1, so the | only way to obtain this key is to already have access to | encrypted filesystem in the first place. | E39M5S62 wrote: | That's exactly right. We also append spl.spl_hostid to | the command line, to work around any possible hostid | mismatches inside the boot environment. | sirn wrote: | Thank you for such a great tool. I've recently migrated | from one server to another server in different contenient | via `zfs send | zfs recv` (using hrmpf), and `generate- | zbm` inside the chroot was all I need to get it working | again. | E39M5S62 wrote: | Since the pool itself is imported read-only by default, | there's no state to keep. We don't even need to export the | pool, no txg's can be generated and left in a pending | state. | | If a pool is switched to read-write so that the default | kernel can be set, a snapshot cloned to a new BE, etc, we | check for that and then export the pool just before kexec. | | Once kexec is done, your BEs kernel and initramfs | essentially start fresh and actas if it's a fresh boot. ___________________________________________________________________ (page generated 2022-11-12 23:00 UTC)