[HN Gopher] Linux boot partitions and how to set them up ___________________________________________________________________ Linux boot partitions and how to set them up Author : throw7 Score : 124 points Date : 2022-11-03 18:13 UTC (4 hours ago) (HTM) web link (0pointer.net) (TXT) w3m dump (0pointer.net) | CameronNemo wrote: | _As growing the size of an existing ESP is problematic (for | example, because there's no space available immediately after the | ESP, or because some low-quality firmware reacts badly to the ESP | changing size)_ | | _Code quality of the firmware in typical systems is known to not | always be great. When relying on the file system driver included | in the firmware it's hence a good idea to limit use to operations | that have a better chance to be correctly implemented._ | | Remind me again why I should cater to people who insist on | writing and running terrible code. | yjftsjthsd-h wrote: | Because that's everyone? What computer are you using that has | really high quality firmware? | CameronNemo wrote: | I guess that is fair. Most of my non-Android computers use | U-Boot. One is some UEFI implementation. I don't know how it | copes with growing the ESP. I don't see why it would freak, | though. | yrro wrote: | Many of us have been burned by making the assumption that | UEFI implementations behave sensibly. Most of the pain | points have been ironed out by now, yes, but the lesson | I've taken away is: don't assume. | sagarun wrote: | Meanwhile the apple firmware cannot read vFAT ESP. Apple wants | ESP to be HFS. | freedinosaur wrote: | > Consider removing any mention of ESP/XBOOTLDR from /etc/fstab, | and just let systemd-gpt-auto-generator do its thing. | | TIL! My NixOS configuration just got a little bit simpler, and | more uniform between machines. | cesarb wrote: | > For example, it's probably worth mentioning that some | distributions decided to put kernels onto the root file system of | the OS itself. For this setup to work the boot loader itself | [sic!] must implement a non-trivial part of the storage stack. | | IIRC, older bootloaders like LILO used a simpler approach: after | each kernel update, a userspace program asked the kernel for the | list of sectors which contained the kernel file, and wrote the | list to a map file; the bootloader then read that map file (its | sector hardcoded into the bootloader by the same userspace | program), and loaded the kernel by reading the sectors directly. | Neither the bootloader nor its userspace installer needed to know | anything about filesystems or other parts of the storage stack, | and it worked perfectly with RAID 1. | yjftsjthsd-h wrote: | That only works with simple filesystems, though; it'll fall | apart if your root filesystem uses, say, compression, | encryption, or possibly any RAID except mirroring (depending on | the details and what the bootloader can handle). | Kwpolska wrote: | I don't really see the point in using the ESP for anything | serious. Many of the arguments are also super weak, like the one | about /boot/efi/ being nested (in how many cases is this actually | important to anyone and anything?). The ESP size issue | successfully prevents the real world adoption of this, since | Linux kernels are 50-100 MB each, which means you could maybe fit | one on your average ESP, and good luck convincing real users to | reinstall Windows just to make some Linux guys happy. | | Instead, I would prefer a different approach. An approach that | can be seen in Windows and macOS, that is: no user serviceable | parts inside. It would work like this: | | * Keep two partitions. | | * ESP contains a simple program (let's call it stub), whose only | job is to call the real boot loader on /boot. | | * The stub is simple (minimum user interaction) and doesn't need | updating very often. It has drivers/support software for storage | media and some reasonable file systems (VFAT, Ext4). It may also | support some simple form of disk encryption if desired. | | * The real bootloader (as well as any kernels) live on the | separate /boot partition. The bootloader can do all the fancy | things it wants, it can display a fancy wallpaper, support mouse | input, and so on. | | * The ESP is not auto-mounted, might not even be listed in | /etc/fstab. | | * Whenever the stub is updated (which would happen rarely, since | it's meant to be simple and minimum), some post-install scripts | would mount ESP in any location they please (be it | /run/{uuid.uuid4()}), copy over the new stub, and immediately | unmount it. | | Simpler, safer, will make it harder for rogue software or `rm -rf | /*` to mess up the booting of the system, and will not require | any changes to existing partition tables. | xearl wrote: | Your "different approach" sounds almost exactly like the | approach proposed in the fine article for the "ESP too small" | case (a case which you assume as a given). | yjftsjthsd-h wrote: | I'm not sold on using automount to reduce the time spent with the | filesystems mounted. Unless I've missed something, having a | filesystem mounted doesn't make it any more susceptible to | damage; being "mounted" just means that the kernel populates its | data structures in memory and adds it to the VFS, it doesn't | incur any ongoing r/w access. What risks corruption is writing | data... which this doesn't stop, because the moment anything | tries to access it the OS will helpfully mount the filesystem | again. | jcranmer wrote: | If things were set up to only mount the filesystems while | they're being modified to update the kernel, I can see the | value in that. I'm guessing that's not being proposed here, | though, because it's too much friction to change the current | boot system scripts, and automount doesn't incur the same | friction? | yjftsjthsd-h wrote: | Even then, what would it change? The only time those | filesystems are being written to is for a new kernel, a new | bootloader, or a bootloader config change. In every one of | those cases, the filesystem still has to be mounted, so I'm | not seeing what the benefit is of keeping things unmounted | right up until you're going to write something. (Basically, I | can't seem to see any version of this that reduces the actual | total writes to the filesystem.) | AshamedCaptain wrote: | In fact, I was bitten once by corruption because the _unmount_ | operation was interrupted mid-write. Not that surprising | considering it's a much less tested scenario on fs code. | yjftsjthsd-h wrote: | Okay, _that_ hadn 't occurred to me. I did wonder if | _mounting_ was a problem, if VFAT has "last mounted time" or | other metadata that gets written per-mount. | AshamedCaptain wrote: | Even Windows has a separate, NTFS boot partition these days. Fail | to see the point of this, and since the main take basically is | "put your /boot inside the FAT ESP, or if not possible, make | /boot a FAT partition", it's also bound to create a lot of | disagreement. | ChuckNorris89 wrote: | Does it? On my Windows 11 install, the EFI partition is still | FAT32. There are no other partitions than the C and the | recovery partition. | | Am I missing something? | CameronNemo wrote: | The commenter is saying that there is an NTFS boot partition | that is chained after the ESP. So UEFI mounts and execs | whatever is in the (vFAT) ESP, and then that ESP bootloader | loads data from the (NTFS) boot partition. | vetinari wrote: | MSR (that another partition) has no role in Windows boot. | Windows will work without it being present at all. | p_l wrote: | Windows has been setting up Reserved partitions with boot | code for some time now - even (or especially) on systems | without EFI | deathanatos wrote: | The OP is about a separate _boot_ partition, which is | normally where the kernel and associated data (on Linux, an | initramfs, obviously Windows would differ a bit). | | The "Reserved" partition on Windows machines isn't really a | boot partition, for any meaningful definition of it. It's | just ... reserved, and MS being MS. On my machine, it's | empty (unformatted, all 0s). It is lightly documented here: | https://learn.microsoft.com/en-us/windows- | hardware/manufactu... | | (I'd expect your typical GPT Windows install to have about | four partitions: the ESP, the empty "reserved" partition, a | recovery partition, and the main NTFS partition.) | cesarb wrote: | > The "Reserved" partition on Windows machines isn't | really a boot partition, for any meaningful definition of | it. It's just ... reserved, and MS being MS. On my | machine, it's empty (unformatted, all 0s). It is lightly | documented here: https://learn.microsoft.com/en- | us/windows-hardware/manufactu... | | IIRC, that "reserved" partition is to allow converting | the data partition which follows it to a "dynamic disk" | (which AFAIK, is Microsoft's equivalent to a Linux LVM | PV). That conversion needs to grow the partition | backwards to prepend some headers, and that extra space | comes from shrinking the reserved partition just before | it. | p_l wrote: | On various BIOS-based systems, the reserved partition | would contain files necessary for booting windows from | its system volume, bridging the gap between what could be | accessed by simplistic MBR boot code, the NTFS boot code | block, and the NTFS-understanding, ARC emulating (for | NT5) or EFI-emulating (NT6) boot system that would load | target system. | | Details on whether reserved partition would be created | and what would be on it depend on hardware you're | installing on, and if separate boot partition is | necessary windows installer would inform you about need | to create an extra partition. | vetinari wrote: | Windows does not have a separate NTFS boot partition. MSR is | not it (check the size and content). Windows Boot Manager and | BCD are stored on the EFI partition. Windows Boot Manager | itself does understand NTFS: it loads winload.efi, ntoskrnl.exe | and core drivers from the system root itself. This way, Windows | is not going to have the common linux problem ("update failed, | /boot too small"). | | Two partitions are needed only for Bitlocker; one has to be | unencrypted. | | Similarly with Apple: they use APFS subvolume for boot files. | They do not bother with multiple partitions and static | allocations, guessing, what size is going to be OK. They can | use as much or as little space as they need. | | -- | | With Linux, I've been using btrfs subvolume for /boot. It works | with "normal" distributions, grub complains (it cannot write | there; I find that OK). The dynamic nature of the space used is | great. It doesn't work with ostree-based distributions (Fedora | Silverblue & its ilk); ostree cannot generate proper BLS and | grub.cfg for subvolumes. | AshamedCaptain wrote: | I'm talking about the WinRE partition, which is required to | boot Bitlocker encrypted boot partition (and Bitlocker is | enabled by default). Enabling Bitlocker without one results | in an error message, and Windows happily recreates/resizes | the WinRE partition on every (OS) upgrade by simply reducing | the size of the main partition. It has been a long time that | the size of the ESP is not enough for all the stuff that | Windows wants to do on preboot. | | For the record, and showing again the unfairness of the | entire MS monopoly situation, most commercial UEFI | implementations out there happen to understand NTFS. This | allows e.g. a Windows pendrive to boot no matter how the user | formats it. | boomboomsubban wrote: | So is this the first work Microsoft set Pottering to do? Kinda | supports my personal conspiracy theory that Microsoft's aiming to | make secure boot only possible with systemd-boot. | | Or Microsoft is trying to make dual booting easier without using | the simplest solution of making a larger ESP the default. | adrian_b wrote: | There are many years since I no longer create partitions on any | SSD or HDD, because I believe that this serves no useful purpose | and it just wastes a part of the SSD/HDD. | | I format directly the raw unpartitioned SSD/HDD with a file | system that uses 100% of the capacity, with no wasted sectors. At | least on Linux and FreeBSD, there is no need of partitions. | | For booting the computers, I either boot them from Ethernet or I | boot them from a small USB memory that uses a FAT file system for | storing the OS kernel, either in the format required by UEFI | booting, or, when booting Linux in legacy BIOS mode, together | with syslinux, which loads the kernel. | cesarb wrote: | That is risky, since without a partition table, some operating | systems and disk management tools will treat the disk as empty, | making it easy to accidentally overwrite data. | | > 100% of the capacity, with no wasted sectors. | | You will never have that. SSDs have a large amount of reserved | space, and even on HDDs, there are some reserved tracks for | defect management. | adrian_b wrote: | By "some operating systems and disk management tools" you | mean MS Windows and Windows tools. | | Obviously, I do not use unpartitioned SSDs/HDDs with Windows. | On the other hand, With Linux and *BSD systems they work | perfectly fine, regardless whether they are internal or | removable. | | For interchange with Windows, I use only USB drives or SSDs | that are partitioned and formatted as exFAT. On the | unpartitioned SSDs/HDDs I use file systems like XFS, UFS or | ZFS, which could not be used with Windows anyway. | | Any SSD/HDD that uses non-Windows file systems should never | be inserted in a Windows computer, even when it is | partitioned. When a SSD/HDD is partitioned, it may be hoped | that Windows will not alter a partition marked as type 0x83 | (Linux), but Windows might still destroy the partition table | and the boot sector of a FAT partition. It happens frequently | that a bootable Linux USB drive is damaged when it is | inserted in a Windows computer, so the boot loader must be | reinstalled. So partitioning an USB drive or SSD does not | protect them from Windows. | | >> 100% of the capacity, with no wasted sectors. > You will | never have that. | | I thought that it is obvious that I have meant 100% of the | capacity available for users, because there is no way to | access the extra storage used by the drive controller and | also no reason to want to access that, because it has a | different purpose than storing user data, so your | "correction" is pointless. | SoftTalker wrote: | I have taken this approach for secondary drives where I want to | use the entire drive as a big filesystem for data. | | For the system disk I have always partitioned it though. I | generally create at least /, /var, /home, and /usr. That way | it's less likely that a runaway process can fill up the entire | disk, at worst it might fill up /home or /var. | | And unless I'm really space-constrained, I'll leave some | unpartitioned space as well, for later flexibility. | candiddevmike wrote: | You're talking about saving _at most_ 200MBish. That's a lot of | work to maintain for little gain... | adrian_b wrote: | There is less work, not more work. | Volundr wrote: | This sounds like work to me | | > For booting the computers, I either boot them from | Ethernet or I boot them from a small USB memory that uses a | FAT file system for storing the OS kernel, either in the | format required by UEFI booting, or, when booting Linux in | legacy BIOS mode, together with syslinux, which loads the | kernel. | | Creating boot USB drives (which I think need partitions | don't they?) or setting up a PXE boot server would take me | a lot more effort than an extra minute with gdisk to create | partitions before formatting the disk. | adrian_b wrote: | If the USB drives were bought formatted as FAT, which is | always true for those smaller than 32 GB, they already | have the required partition. | | For booting with UEFI, you just need to create the | directories with the names expected by the firmware. For | legacy booting, you just need to install syslinux, which | takes a second. | | Then the USB drive can be used to boot any computer, | without any other work, for many years. | | When you change the kernel, you just mount the USB drive | (which is not mounted otherwise), then you copy the new | kernel to the USB drive (possibly together with an initrd | file), renaming it during the copy, you unmount the USB | drive and that is all. | | You can keep around a few USB drives with different | kernel versions, and if an update does not go well, you | just replace the USB drive with one having an older | version. | | Configuring a DHCP/TFTP server for Ethernet booting is | done only once. | | Adding extra computers may need a directory copy in the | directory of the TFTP server only when the new computers | have a different hardware that requires different OS | kernels. | | Updating a kernel requires just a file copy towards the | directory of the TFTP server, replacing the old kernel. | | None of these operations requires more work than when | using a boot partition on the root device. | | There is less work because you make booting USB drives or | a DHCP/TFTP server only once for many years or even | decades, while you need to partition the SSD/HDD whenever | you buy a new one that will be used as the root device. | CameronNemo wrote: | So your solution to not use partitions is to use multiple | disks? You do understand that people invented partitions | precisely because they wanted to use a single disk, right? | | I am glad this setup works for you, but many people will not | want to need a USB drive to boot their desktop, laptop, tablet, | phone, et cetera. | yjftsjthsd-h wrote: | > For booting the computers, I either boot them from Ethernet | or I boot them from a small USB memory that uses a FAT file | system for storing the OS kernel, either in the format required | by UEFI booting, or, when booting Linux in legacy BIOS mode, | together with syslinux, which loads the kernel. | | That certainly works, but I'm pretty sure that moving booting | off of your main disk is the only reason you can go without | partitions, and I'm also pretty sure that most people don't | want to deal with that. | dottedmag wrote: | Great to see Lennart back working on what systemd does best: | streamlining and cleaning existing grubby (pun intended) parts of | Linux. | | /boot (and ESP) management always feels hacky at best. | CameronNemo wrote: | Can you explain what about the boot partition status quo seems | hacky, and how this approach "cleans" it? | admax88qqq wrote: | Issues with the current setup are pretty well laid out in the | article. | CameronNemo wrote: | The article does not characterize the status quo as hacky, | and furthermore does not acknowledge the existence of | mainstream setups that do not share the pitfalls of the | "typical setup". I.E. the article kind of creates a | strawman. | hyperupcall wrote: | This is excellent! | | Over the years, I've been pleased to see that more and more | distributions are writing their disk images and the like to the | ESP. (Previously, dd'd USB images for distro installing | _required_ the creation of a /boot partition) | | The logical next step would be to standardize everything through | systemd, and ensure all boot images are autodiscoverable and | automatically bootable. | | It's been somewhat frustrating for distributions to install GRUB, | hijacking the previous prioritized boot PE, and have entries for | other installed Linux distributions missing. | amarshall wrote: | > through systemd, and ensure all boot images are | autodiscoverable and automatically bootable. | | See systemd-boot and BootLoaderSpec, both mentioned in OP. | | https://www.freedesktop.org/wiki/Software/systemd/systemd-bo... | https://systemd.io/BOOT_LOADER_SPECIFICATION/ | candiddevmike wrote: | How to make one boot partition to rule them all (debian, secure | boot disabled for UEFI due to weird bug with how files are laid | out with removable flag): parted -s "${diskpath}" | mklabel gpt parted -s "${diskpath}" mkpart primary 1MiB | 2MiB parted -s "${diskpath}" set 1 bios_grub on | parted -s "${diskpath}" mkpart primary 2Mib 202MiB parted | -s "${diskpath}" set 2 esp on sleep 1 mkfs.fat -F 32 | -n "boot" "${diskpath}2" mount "/dev/disk/by-label/boot" | "/boot" grub-install --target=i386-pc "${diskpath}" | grub-install --target=x86_64-efi --efi-directory=/boot | --removable --no-uefi-secure-boot grub-mkconfig > | "/boot/grub/grub.cfg" | | This makes two partitions, one for GRUB to inject legacy BIOS | boot code into and one for the ESP. ESP gets mounted to /boot, | grub gets setup to support both. Only bug is Debian complains | about symlinking .bak files for initrd, no biggie. | | (This is part of a larger Debian imaging script I made) | GauntletWizard wrote: | The symlinking .bat files for initrd ended up completely | bricking my Ubuntu installation a few months ago, so... take | heed? I am unimpressed by the quality and robustness of that | whole stuff, though, but ended up writing essentially this | article into my notes for the next Arch installation I make, | where I think there's a fair chance that EFI will live in /efi | but be symlinked in /boot. | | There's a whole other problem, though, of my wanting to use | secure boot. Ubuntu was the easy out for that. At the moment, | it's just disabled on my machines, which is far from optimal. | yjftsjthsd-h wrote: | > grub-mkconfig > "/boot/grub/grub.cfg" | | Shell redirection is fine, but you could just use `grub- | mkconfig -o` to set the output file. | IgorPartola wrote: | Why? One more flag to remember whereas redirection works for | any program. | gnu8 wrote: | It doesn't matter which end of the egg. | Dunedan wrote: | > This makes two partitions, one for GRUB to inject legacy BIOS | boot code into and one for the ESP. | | Why do you still need the legacy BIOS boot logic? | candiddevmike wrote: | For BIOS boot, the BIOS looks at the first couple of blocks | on a hard drive for the boot code. This is the first GPT | partition that gets created, and the future grub-install code | injects the BIOS bootloader there. Thus, to support BIOS and | UEFI, you need the BIOS bootloader at the beginning of the | drive. | Karliss wrote: | Yes, but why would you want the BIOS boot assuming you have | a motherboard made in last 10 years and it has a UEFI | implementation which isn't completely broken. | | I might understand doing that just in case when preparing a | bootable flash drive. Why complicate things for permanent | installations where you know your current hardware and your | next system after 5 years is unlikely to much worse than | current one? | candiddevmike wrote: | You wouldn't, this is more for cloud/virtual images where | some providers support UEFI but most still only support | BIOS. | WaitWaitWha wrote: | To provide more to the "why", although end-user devices have | moved away from BIOS-only boot, there are large number of | systems that have no alternative to BIOS, and the hardware | cannot be upgraded, only the firmware and software. | | Most of the systems I have ran into were in SCADA[0], PMS[1], | and BMS[2]. | | [0]:https://en.wikipedia.org/wiki/SCADA | | [1]:https://en.wikipedia.org/wiki/Power_management_system | | [2]:https://en.wikipedia.org/wiki/Building_management_system | CameronNemo wrote: | _" The Boot partition will also have to carry an emtpy "efi" | directory that can be used as the inner mount point, and serves | no other purpose."_ | | You could substitute Boot for Root in this sentence and flip it | around on Poettering. | freedinosaur wrote: | The root partition already contains a set of empty directories, | and Lennart has been working on reducing those where possible | (see usr-merge). | CameronNemo wrote: | It just feels like such a small thing that is not even worth | mentioning or taking into account. | cryptonector wrote: | Using ZFS makes this all a lot simpler. | iio7 wrote: | No, it does not. | [deleted] | deathanatos wrote: | I fail to see how? The ESP partition must be vFAT on GPT, in | order for the BIOS to find it. Your BIOS doesn't speak ZFS. | | The main partition can be whatever, but that's not typically | available until after the kernel & initramfs are loaded. (As it | is typically initramfs that does the prompt for the password, | to decrypt it.) | 2OEH8eoCRo0 wrote: | ZFS is the "crypto solves this" of filesystems. | | Adding out of tree ZFS to the boot mix sounds hella | complicated. | yjftsjthsd-h wrote: | Interestingly, GRUB actually supports ZFS; it has the dubious | distinction of being the only extant implementation of ZFS | that's GPL licensed, but... probably _because_ of that... it | 's separate from the main OpenZFS implementation is extremely | feature-poor. This results in fun things like Ubuntu's root- | on-ZFS layout creating 2 pools; a boot pool (bpool) that GRUB | can read, and a root pool (rpool) with the OS. It's not | _that_ complicated, but it 's not _nice_. | Volundr wrote: | Interesting. I've never really wanted on boot on ZFS, and I | definitely don't see the point if I'd need a dedicated pool | for it. | yjftsjthsd-h wrote: | It _could_ be really cool; it would let you snapshot your | boot filesystem and roll back to a previous | configuration. | | ...I say, as someone who does in fact leave my boot | filesystems on VFAT:) | vetinari wrote: | Ubuntu _did_ snapshot the bpool; unfortunately, it did a | poor job of garbage collecting the snapshots. Meaning | that eventually you would have failing kernel updates due | to lack of space, and having to manually clean it up. | | Since 22.04, zsys (the tool that did the snapshoting) is | not installed by default. | yjftsjthsd-h wrote: | > Since 22.04, zsys (the tool that did the snapshoting) | is not installed by default. | | Er, are they not snapshotting the root filesystem or ex. | /home by default then? | vetinari wrote: | No, you have to install and enable zsys yourself. | rektide wrote: | AFAIK Debian still doesnt have any integration available for | handling thee integration of systemd-boot & kernel packages: | there's nothing to maintain the loader/entries files that | systemd-boot expects! It's really a shame because systemd-boot is | 10x simpler and 100x more plesant to work with than grub & it's | multiple overlapping but different obtuse config handling shell | scripts. Bootctl is excellent & understandable, the entries are | human readable/authorable. | | I've been on systemd-boot for a long while now. For a while I was | just hand maintaining vmlinuzs & loader entries, copying & | editing stuff on /boot/efi. Easy but inelegant & I'd forget the | very simple steps. | | I've been copy pasting (well, its in my ansible now) | /etc/kernel/postinst.d/ hooks from stackoverflow, which writes | these files, & that greatly simplified life. It's a jank hurdle & | I wish my os would actually support this wonderful easy to use | tool. Systemd-boot is so much less obtuse, such a breath or fresh | air, after years of grub (and many of uboot as well, but that's a | different sector). | | I made the jump ~two years ago to a single partition, just the | ESP partition. Theres a warning abiut not being able to set | permissions properly but its worked fine & been so much more | pleasant to operate. Very strongly recommend. It just worked for | me on Debian, no real fiddling. | | https://github.com/filakhtov/kernel-postinst-d/blob/master/9... | yrro wrote: | OTOH it's nice to have access to the boot loader via serial | console. The system developers judgement is that it's up to the | firmware to provide serial console access if needed. Well I'll | just get my chequebook out then... | | Also it would be nice to be able to interact with the boot | loader on a modern laptop display without having to get out a | magnifying glass. Another problem that is deigned to be the | fault of the firmware. | | One of the great things about open source operating systems is | that people step up to provide these sorts of improvements and | I think it's a shame that systemd-boot will cause regression | here. ___________________________________________________________________ (page generated 2022-11-03 23:00 UTC)