[HN Gopher] Systemd service sandboxing and security hardening (2... ___________________________________________________________________ Systemd service sandboxing and security hardening (2020) Author : capableweb Score : 242 points Date : 2022-01-18 10:31 UTC (1 days ago) (HTM) web link (www.ctrl.blog) (TXT) w3m dump (www.ctrl.blog) | the8472 wrote: | Alas, no whitelisting option. A service should start in an empty | filesystem root without network access - and if we had something | as convenient as pledge() also without any allowed syscalls - and | then you could only add what is needed. | | firejail does this a bit better but it also started out with a | blacklist approach and it's more geared towards desktop | application use, not system services. | Icathian wrote: | One of my favorite podcasts, Risky Business[0] regularly plugs | Airlock[1]. They seem like they might be the one out front, at | least as a paid service. | | [0] https://risky.biz/netcasts/risky-business/ [1] | https://www.airlockdigital.com/ | 5e92cb50239222b wrote: | What's the problem with firejail? Start with an empty profile, | blacklist everything, and whitelist only the stuff you need. It | works just fine for server applications, and unlike systemd | isolation flags you can setup a proper separate firewall with | the `netfilter` option. | the8472 wrote: | > blacklist everything, | | That isn't a whitelist approach. | goodpoint wrote: | That is exactly an allowlist approach. | Someone wrote: | > blacklist everything, and whitelist only the stuff you | need | | That _is_ a whitelist approach. | Seirdy wrote: | Firejail has had multiple sandbox escape vulns in the past. | Firejail is an SUID executable in which sandbox escapes can | lead to privilege escalation. In contrast, Systemd allows you | to run services as unprivileged users, and even create users | on demand. | | Systemd also supports firewalling: it supports IP address | allow/deny policies, ports, etc. For more advanced firewall | policies you're probably better off using an actual firewall | daemon like firewalld or ufw. | Someone wrote: | pledge is excellent, but it protects programmers against | writing security bugs that have large impact, it doesn't | protect you against the software they write. It's those | programmers who restrict what their tools can do, and who | decide when to throw the switch to enable those restrictions. | | If you trust those programmers, it's indeed way more convenient | than other tools, if only because it removes the need for | configuring things twice. For example, instead of configuring | your web server to serve files from _/ foo/bar/_ _and_ telling | SELinux that your web server is allowed to read from _/ | foo/bar_, you only configure the web server, and it will tell | the OS "I shouldn't read from anything but _/ foo/bar_, | starting ... now". | | You'll have to trust the web server to do that, though. | the8472 wrote: | That's what it is intended for. But pledge has nice | properties beyond that which are also useful for external | sandboxing. Such as defining easy to understand syscall | groups maintained by the kernel as new syscalls are | introduced. If linux had that we could for example grant | stdio+rpath and not worry about the kernel introducing | preadv3 and programs compiled with that getting broken or | suboptimal performance when isolated and it would | automatically apply to equivalent io_uring implementations | block equivalent SQEs too. | ape4 wrote: | Apache needs to start as `root` but then drops to an non- | privileged user. systemd's `User=<user>` can't really express | that. Perhaps an option that says a unit needs to be root until | the first fork when it has to be a specified user. | `ForkUser=apache` | staticassertion wrote: | This is one of the main problems with "whole program | sandboxes". Many times a program only needs permissions right | at the start and then never again. From the outside though | there's no way to signal "OK, I'm done, lock me down" for most | sandboxing systems. | | One approach that _may_ work with systemd is to have two | processes. One would be a broker, running as root. It would | grab a port, for example. The other process would be spawned by | the broker as a limited service and inherit that port from the | parent, with no permissions of its own to open it, only to | inherit. | | IDK how to express that in systemd-land though. At that point | you might be better off just writing the code to sandbox things | yourself. | candiddevmike wrote: | It only needs to root to bind to privileged ports I believe. | You should be able to use a non-root user and give it | CAP_NET_BIND_SERVICE: | | [Service] | | AmbientCapabilities=CAP_NET_BIND_SERVICE | ape4 wrote: | Cool! But then I suppose the forked processes could then bind | to a low numbered port - something they can't do now. So | Apache would have to make sure to revoke that capability when | forking. | 5e92cb50239222b wrote: | You could combine it with something like this | SocketBindDeny=any SocketBindAllow=tcp:80 | SocketBindAllow=tcp:443 | | These ports should be denied by the kernel because they're | already taken by httpd, and all other will be denied by bpf | filters installed by systemd. | | It feels like plugging holes in a dam, but that's what you | do with popular operating systems. | 5e92cb50239222b wrote: | I don't know about httpd specifically, but many applications | want root only to be able to bind to a privileged port (like | :80). This can be circumvented in one of a few ways: | | 1. add this to .service | AmbientCapabilities=CAP_NET_BIND_SERVICE | | 2. or listen on :8080 and use NAT: iptables -t | nat -I OUTPUT -p tcp -o lo --dport 80 -j REDIRECT --to-ports | 8080 | | 3. or make the port unprivileged sysctl -w | net.ipv4.ip_unprivileged_port_start=80 | | It may work for httpd too, I haven't tested it. | [deleted] | Un1corn wrote: | The correct Systemd solution would be to create a socket unit | but your solutions works without modifying the service code | growse wrote: | I think this requires support from the service, no? | | Not everything that wants to open up a port seems to | support socket activation. I tried with 6tunnel and | couldn't get it to work. | Spivak wrote: | I can't find anything for an officially supported for | Apache or Nginx to support inetd/systemd socket activation | bit it certainly would be nice. | marcosdumay wrote: | Apache also uses the start user to read stuff like TLS | private keys, that its normal user does not have access to. | ape4 wrote: | And I think its common for the log files to be in | /var/log/httpd owned by root but I suppose they could be | moved and chown-ed. | eliaspro wrote: | Using systemd's LogDirectory= directive will fully take | care of ensuring the required directory is present and | permissions match the defined User=/Group= of the unit. | VTimofeenko wrote: | It's possible to remove the root requirement for this | through systemd's credentials mechanisms: | | https://www.freedesktop.org/software/systemd/man/systemd.ex | e... | eliaspro wrote: | Many applications don't need to bind the port themselves but | will happily accept one passed to them during process | invocation. | | This allows to let systemd to manage ports using socket units | which will also stay up and buffer requests when restarting a | service, allow service activation on demand/incoming requests | or per connection service instances, e.g. for better isolation | of sshd's per connection/user. | a-dub wrote: | can you limit outbound network access to specified | masks/ports/devices on a per-service level? | [deleted] | 5e92cb50239222b wrote: | This is a pretty lax policy IMHO, you can go much farther. These | days I usually start with this, it's much more strict: | | https://news.ycombinator.com/item?id=29976096 | | Or simply follow whatever `systemd-analyze security` recommends, | just make sure you run it on a system with recent systemd. | westurner wrote: | Which distro has the best out-of-the-box output for:? | systemd-analyze security | | Is there a tool like `audit2allow` for systemd units? | selinux/python/audit2allow/audit2allow: | https://github.com/SELinuxProject/selinux/blob/master/python... | | https://stopdisablingselinux.com/ | [deleted] | goodpoint wrote: | Debian does a lot of sandboxing. | aidenn0 wrote: | To the point where it breaks logind on NIS setups... | 5e92cb50239222b wrote: | > Which distro has the best out-of-the-box output | | I haven't seen any difference between distributions with the | same systemd version. Anything with a recent one should do | fine. More recent than RHEL8, mind you (which is on systemd | 239): for example, a syscall allow/deny analysis is buggy | there and asks you to enable some protections, and then | disable them. The same unit is analyzed correctly on my | desktop with v250 (I use the popular rolling release | distribution). | | I haven't seen anything like audit2allow. It's probably not | especially necessary because of the difference in | philosophies: SELinux is deny by default, while in systemd | you're playing whack-a-mole anyway, and are expected to add | directives one by one until the application stops working. | Unit logs usually make it obvious if something was denied. | Arnavion wrote: | The usual way I've seen (and do myself) is to just let the | process be killed and have its coredump taken, then | `coredumpctl gdb $process_name -A '-ex "print $rax" -ex | "quit"'` to get the syscall number, then check `systemd- | analyze syscall-filter` for whether I want to allow just | that one syscall or the whole group it's in. | growse wrote: | > The usual way I've seen (and do myself) is to just let | the process be killed and have its coredump taken, then | `coredumpctl gdb $process_name -A '-ex "print $rax" -ex | "quit"'` to get the syscall number, then check `systemd- | analyze syscall-filter` for whether I want to allow just | that one syscall or the whole group it's in. | | Another approach would be to set SystemCallLog= to be the | opposite of SystemCallFilter= (negate each group with ~) | and then you'll see the call (and caller) in the journal. | d2wa wrote: | This is a getting started/101 introduction; it also talks about | and recommends systemd-analyze security. There's a link to part | two at the bottom of the article that goes deeper into things. | DyslexicAtheist wrote: | any system that starts security by blacklisting instead of | whitelisting tends to be doomed by upcoming changes. | egberts1 wrote: | Whose gonna write THE holy-grail of analyzer of many executables | to determine what Linux capabilities, cgroups, and syscalls are | just being referenced? | | Caveat: it has to dig into ALL the linked libraries as well. | kenniskrag wrote: | if you want to test these settings I can recommend `sudo systemd- | run -p "DynamicUser=yes" -p "ProtectSystem=yes" -p | "ProtectHome=yes" --shell` but be in a readable directory like | /tmp or you receive an error. | 5e92cb50239222b wrote: | This is a very handy command in day-to-day work, actually. For | example, I use to limit the total amount of memory available to | an application, including page cache: $ | systemd-run --user --scope --property=MemoryHigh=1G qbittorrent | | It works just as you'd expect -- if qbittorrent's working set | goes above 1024 MiB, it pushes the least recently used page out | of the page cache. Doesn't really have any effects on upload or | download speeds, while helping to keep more useful data in | memory. | | Many isolation flags are not available in `systemd-run --user`, | though, so if you'd like to have some protection you either | have to combine `sudo systemd-run` with `su -c`, or wrap the | command in firejail. | | https://github.com/netblue30/firejail/ | wmanley wrote: | I have a bash alias for `make` and `ninja` to do something | similar. Just having all the spawned processes in a cgroup | helps with system interactivity while building. This works | because the kernel will then schedule the whole build as a | single unit against the other work on the system, rather than | scheduling each process that the build spawns against every | other process that I'm running. | t0astbread wrote: | Interesting, a few months ago I tried using systemd-run to | implement unprivileged memory limits for a process and I'm | pretty sure it didn't work with the user manager. Is this a | recent addition? (I'm not sure what version of systemd I had | at the time.) | pram wrote: | Ooh, is this a good way to sandbox execs like ImageMagick or | stuff like that? | 5e92cb50239222b wrote: | Use firejail, it's a "one click" solution with prepackaged | profiles. | | https://github.com/netblue30/firejail/ | | It uses the same kernel knobs as systemd does, but is more | user-friendly and has more features. | | I use it for every application that handles data received | from other machines: books, images, documents, whatever. | YorickPeterse wrote: | You can also use Bubblewrap, but getting it up and running | requires a lot more fiddling around. For example, this is | what I use to isolate Zoom from the rest of my system: http | s://gitlab.com/yorickpeterse/dotfiles/-/blob/0a0492c78b6... | | In my case I'm using Bubblewrap because Firejail was only | used for Zoom, and this felt a bit of a waste considering | Bubblewrap was already installed. | max002 wrote: | Great article :) thank you! | [deleted] | HowardStark wrote: | Is there any advice for working with older systemd versions? | Right off the bat, systemd 237 is out because there is no | security feature for that version of systemd-analyze. | 5e92cb50239222b wrote: | Use the same config you'd use for the latest systemd version. | It will ignore flags it doesn't know (and warn you in unit | logs). | bloopernova wrote: | Not meant to be a snarky comment, but a serious question: how | does this differ from SELinux? | cpuguy83 wrote: | They are completely different things, and where available | should be used together. | | SELinux is a policy system where policy is enforced via labels. | | Labels are applied to processes which classify what the process | is. | | Labels are applied to files which define the what | classification of process can access the file. | | The application of labels happens automatically based on | policy. Such policy would include the location of the file or | the label of the parent process. | | As an example, the default policy for httpd would prevent httpd | from accessing /etc/passwd even though the process is running | as (or can be) the root user. I believe you could also do | interesting things like prevent httpd from opening a socket on | a non-standard port if you wanted to. | | SELinux is very powerful but complicated. Ideally you use this | with distro packages which should have policies already | configured for you. | | Critically it is not one vs the other. Use both if you have it. | tyingq wrote: | It seems to be using mostly the linux capabilities: | https://man7.org/linux/man-pages/man7/capabilities.7.html | | So the overlap choice seems to be more around SELinux versus | Capabilities. Where SELinux is more fine-grained and tunable, | but more complicated also. | aseipp wrote: | It's not just Linux capabilities; on their own Linux | capabilities actually suck majorly and are very limited (AKA | "crapabilities"). But systemd also makes extensive usage of | cgroups and namespacing facilities to back it up e.g. | preventing runaway memory/CPU quotas and stopping | applications from accessing paths they shouldn't, restricting | network access, stuff like that. Some of this overlaps with | SELinux (e.g. restricting file access) but the mechanism is | fairly different. | | The overlap/comparison between capabilities, systemds | features, and selinux features isn't really well defined in | any meaningful way IMO. It's really like 5 different features | being used in various ways. | PeterWhittaker wrote: | I'm curious what you mean by SELinux features not being | well-defined? While poorly documented, they are | extraordinarily precisely defined, allowing fine-grained | control of pretty much everything, all enforced by the | kernel with no workarounds, at least in enforcing mode. | staticassertion wrote: | It's vastly simpler, for one thing. SELinux is basically a | weird DSL/ programming language for describing system | interactions whereas systemd is providing a very basic | interface for common restrictions. | | I would pretty much never ask a human being to write SELinux | policies unless that was explicitly part of their job whereas I | can pretty much point any developer to what systemd is | providing and they'll be able to work with it. | chasil wrote: | SELinux is designed as "mandatory access control," meaning that | it is not normally disabled. | | The normal filesystem permissions of read/write/execute for | user/group/other are among those known as "discretionary access | controls," meaning that they can be relaxed. | | The systemd unit security options are discretionary, at the | control of the administrator. | t0astbread wrote: | Is SELinux not also in the administrator's control? | candiddevmike wrote: | These days, systemd is better/easier to sandbox _services_ than | SELinux. SELinux/AppArmor is still the best way to protect | individual GUI and user apps (anything not ran from systemd | basically). | mbakke wrote: | I don't have much experience with SELinux, but at least in my | org the base policy is to run anything started interactively | by the user (or root) in _unconfined_t_ , i.e. with | protections disabled. | | That is, the same command that gets denied by SELinux through | systemd will run fine (and unprotected) when started from a | shell. | | Do you write your own policies for individual end-user | programs? | p_l wrote: | Easier, maybe. Better, nope. The breadth and detail available | just don't compare, and not in the way where systemd can even | touch the scope available to SElinux | candiddevmike wrote: | Can you expand on that? In my opinion, systemd has far more | controls for process security over SELinux (networking, | cgroups, nspawn sandboxing, etc). | p_l wrote: | Out of those, the only things that aren't covered by | SELinux are things that would be expected to be set by | wrapper/launcher process (modifying namespaces - which | covers nspawn and setting cgroups). Everything else, i.e. | actual run-time access decisions, is more fine grained | and controllable through SELinux, including level of | access control like whether a program can listen on a | socket or bind a socket, while still permitting it to | connect. | mst wrote: | SElinux is more capable in theory but so much less | usable/discoverable in practice that I suspect anybody who | isn't truly dedicated to doing SElinux right will end up | averaging better security via the systemd route. | | (and I say this based on both observation and personal | experience, I have some stuff to harden later this year and | I'm really hoping I'll be able to involve somebody who | -has- that level of SElinux knowledge but plan B is almost | certainly going to be 'mst does his best with the unit | configs') | PeterWhittaker wrote: | As someone who does a fair amount of SELinux | professionally, I'd mostly agree with this: getting | started can be daunting, so one could likely get far more | value from a short time focusing on systemd security. | | But if one can spare the time, SELinux can secure | everything, not just systemd services. | | It all depends on the threat vectors one faces. | p_l wrote: | That's why I won't even try to suggest SELinux is | _easier_. It 's definitely easier to apply _some_ | sandboxing through systemd, but it 's pretty coarse | grained and mostly seems to hit some relatively easy wins | involving capabilities dropping and stuff that is often | hidden deep inside PAM. Good start, but I wouldn't call | it "better" ultimately. | kaba0 wrote: | Why not use both? They are not complementary. | candiddevmike wrote: | Why would you use SELinux along with systemd? Systemd can | do filesystem permissions declaratively vs SELinux having | to label the files individually, e.g.: | | [Service] | | ProtectSystem=strict | | ReadWritePaths=/some/path | | ReadOnlyPaths=/some/otherpath | | InaccessiblePaths=/etc | PeterWhittaker wrote: | One can write extraordinarily short FC files using regexp | to apply specific SELinux labels as desired, and control | access to those labels with only a few rules. | | Unlike systemd, they then apply to everything. | loudtieblahblah wrote: | Does your box still touch local dns before connecting to VPN? No? | | Then anything with systemd and security can stuff it. | getcrunk wrote: | What? ___________________________________________________________________ (page generated 2022-01-19 23:00 UTC)