[HN Gopher] Systemd service sandboxing and security hardening (2...
       ___________________________________________________________________
        
       Systemd service sandboxing and security hardening (2020)
        
       Author : capableweb
       Score  : 242 points
       Date   : 2022-01-18 10:31 UTC (1 days ago)
        
 (HTM) web link (www.ctrl.blog)
 (TXT) w3m dump (www.ctrl.blog)
        
       | the8472 wrote:
       | Alas, no whitelisting option. A service should start in an empty
       | filesystem root without network access - and if we had something
       | as convenient as pledge() also without any allowed syscalls - and
       | then you could only add what is needed.
       | 
       | firejail does this a bit better but it also started out with a
       | blacklist approach and it's more geared towards desktop
       | application use, not system services.
        
         | Icathian wrote:
         | One of my favorite podcasts, Risky Business[0] regularly plugs
         | Airlock[1]. They seem like they might be the one out front, at
         | least as a paid service.
         | 
         | [0] https://risky.biz/netcasts/risky-business/ [1]
         | https://www.airlockdigital.com/
        
         | 5e92cb50239222b wrote:
         | What's the problem with firejail? Start with an empty profile,
         | blacklist everything, and whitelist only the stuff you need. It
         | works just fine for server applications, and unlike systemd
         | isolation flags you can setup a proper separate firewall with
         | the `netfilter` option.
        
           | the8472 wrote:
           | > blacklist everything,
           | 
           | That isn't a whitelist approach.
        
             | goodpoint wrote:
             | That is exactly an allowlist approach.
        
             | Someone wrote:
             | > blacklist everything, and whitelist only the stuff you
             | need
             | 
             | That _is_ a whitelist approach.
        
           | Seirdy wrote:
           | Firejail has had multiple sandbox escape vulns in the past.
           | Firejail is an SUID executable in which sandbox escapes can
           | lead to privilege escalation. In contrast, Systemd allows you
           | to run services as unprivileged users, and even create users
           | on demand.
           | 
           | Systemd also supports firewalling: it supports IP address
           | allow/deny policies, ports, etc. For more advanced firewall
           | policies you're probably better off using an actual firewall
           | daemon like firewalld or ufw.
        
         | Someone wrote:
         | pledge is excellent, but it protects programmers against
         | writing security bugs that have large impact, it doesn't
         | protect you against the software they write. It's those
         | programmers who restrict what their tools can do, and who
         | decide when to throw the switch to enable those restrictions.
         | 
         | If you trust those programmers, it's indeed way more convenient
         | than other tools, if only because it removes the need for
         | configuring things twice. For example, instead of configuring
         | your web server to serve files from _/ foo/bar/_ _and_ telling
         | SELinux that your web server is allowed to read from _/
         | foo/bar_, you only configure the web server, and it will tell
         | the OS "I shouldn't read from anything but _/ foo/bar_,
         | starting ... now".
         | 
         | You'll have to trust the web server to do that, though.
        
           | the8472 wrote:
           | That's what it is intended for. But pledge has nice
           | properties beyond that which are also useful for external
           | sandboxing. Such as defining easy to understand syscall
           | groups maintained by the kernel as new syscalls are
           | introduced. If linux had that we could for example grant
           | stdio+rpath and not worry about the kernel introducing
           | preadv3 and programs compiled with that getting broken or
           | suboptimal performance when isolated and it would
           | automatically apply to equivalent io_uring implementations
           | block equivalent SQEs too.
        
       | ape4 wrote:
       | Apache needs to start as `root` but then drops to an non-
       | privileged user. systemd's `User=<user>` can't really express
       | that. Perhaps an option that says a unit needs to be root until
       | the first fork when it has to be a specified user.
       | `ForkUser=apache`
        
         | staticassertion wrote:
         | This is one of the main problems with "whole program
         | sandboxes". Many times a program only needs permissions right
         | at the start and then never again. From the outside though
         | there's no way to signal "OK, I'm done, lock me down" for most
         | sandboxing systems.
         | 
         | One approach that _may_ work with systemd is to have two
         | processes. One would be a broker, running as root. It would
         | grab a port, for example. The other process would be spawned by
         | the broker as a limited service and inherit that port from the
         | parent, with no permissions of its own to open it, only to
         | inherit.
         | 
         | IDK how to express that in systemd-land though. At that point
         | you might be better off just writing the code to sandbox things
         | yourself.
        
         | candiddevmike wrote:
         | It only needs to root to bind to privileged ports I believe.
         | You should be able to use a non-root user and give it
         | CAP_NET_BIND_SERVICE:
         | 
         | [Service]
         | 
         | AmbientCapabilities=CAP_NET_BIND_SERVICE
        
           | ape4 wrote:
           | Cool! But then I suppose the forked processes could then bind
           | to a low numbered port - something they can't do now. So
           | Apache would have to make sure to revoke that capability when
           | forking.
        
             | 5e92cb50239222b wrote:
             | You could combine it with something like this
             | SocketBindDeny=any       SocketBindAllow=tcp:80
             | SocketBindAllow=tcp:443
             | 
             | These ports should be denied by the kernel because they're
             | already taken by httpd, and all other will be denied by bpf
             | filters installed by systemd.
             | 
             | It feels like plugging holes in a dam, but that's what you
             | do with popular operating systems.
        
         | 5e92cb50239222b wrote:
         | I don't know about httpd specifically, but many applications
         | want root only to be able to bind to a privileged port (like
         | :80). This can be circumvented in one of a few ways:
         | 
         | 1. add this to .service
         | AmbientCapabilities=CAP_NET_BIND_SERVICE
         | 
         | 2. or listen on :8080 and use NAT:                 iptables -t
         | nat -I OUTPUT -p tcp -o lo --dport 80 -j REDIRECT --to-ports
         | 8080
         | 
         | 3. or make the port unprivileged                 sysctl -w
         | net.ipv4.ip_unprivileged_port_start=80
         | 
         | It may work for httpd too, I haven't tested it.
        
           | [deleted]
        
           | Un1corn wrote:
           | The correct Systemd solution would be to create a socket unit
           | but your solutions works without modifying the service code
        
             | growse wrote:
             | I think this requires support from the service, no?
             | 
             | Not everything that wants to open up a port seems to
             | support socket activation. I tried with 6tunnel and
             | couldn't get it to work.
        
             | Spivak wrote:
             | I can't find anything for an officially supported for
             | Apache or Nginx to support inetd/systemd socket activation
             | bit it certainly would be nice.
        
           | marcosdumay wrote:
           | Apache also uses the start user to read stuff like TLS
           | private keys, that its normal user does not have access to.
        
             | ape4 wrote:
             | And I think its common for the log files to be in
             | /var/log/httpd owned by root but I suppose they could be
             | moved and chown-ed.
        
               | eliaspro wrote:
               | Using systemd's LogDirectory= directive will fully take
               | care of ensuring the required directory is present and
               | permissions match the defined User=/Group= of the unit.
        
             | VTimofeenko wrote:
             | It's possible to remove the root requirement for this
             | through systemd's credentials mechanisms:
             | 
             | https://www.freedesktop.org/software/systemd/man/systemd.ex
             | e...
        
         | eliaspro wrote:
         | Many applications don't need to bind the port themselves but
         | will happily accept one passed to them during process
         | invocation.
         | 
         | This allows to let systemd to manage ports using socket units
         | which will also stay up and buffer requests when restarting a
         | service, allow service activation on demand/incoming requests
         | or per connection service instances, e.g. for better isolation
         | of sshd's per connection/user.
        
       | a-dub wrote:
       | can you limit outbound network access to specified
       | masks/ports/devices on a per-service level?
        
       | [deleted]
        
       | 5e92cb50239222b wrote:
       | This is a pretty lax policy IMHO, you can go much farther. These
       | days I usually start with this, it's much more strict:
       | 
       | https://news.ycombinator.com/item?id=29976096
       | 
       | Or simply follow whatever `systemd-analyze security` recommends,
       | just make sure you run it on a system with recent systemd.
        
         | westurner wrote:
         | Which distro has the best out-of-the-box output for:?
         | systemd-analyze security
         | 
         | Is there a tool like `audit2allow` for systemd units?
         | selinux/python/audit2allow/audit2allow:
         | https://github.com/SELinuxProject/selinux/blob/master/python...
         | 
         | https://stopdisablingselinux.com/
        
           | [deleted]
        
           | goodpoint wrote:
           | Debian does a lot of sandboxing.
        
             | aidenn0 wrote:
             | To the point where it breaks logind on NIS setups...
        
           | 5e92cb50239222b wrote:
           | > Which distro has the best out-of-the-box output
           | 
           | I haven't seen any difference between distributions with the
           | same systemd version. Anything with a recent one should do
           | fine. More recent than RHEL8, mind you (which is on systemd
           | 239): for example, a syscall allow/deny analysis is buggy
           | there and asks you to enable some protections, and then
           | disable them. The same unit is analyzed correctly on my
           | desktop with v250 (I use the popular rolling release
           | distribution).
           | 
           | I haven't seen anything like audit2allow. It's probably not
           | especially necessary because of the difference in
           | philosophies: SELinux is deny by default, while in systemd
           | you're playing whack-a-mole anyway, and are expected to add
           | directives one by one until the application stops working.
           | Unit logs usually make it obvious if something was denied.
        
             | Arnavion wrote:
             | The usual way I've seen (and do myself) is to just let the
             | process be killed and have its coredump taken, then
             | `coredumpctl gdb $process_name -A '-ex "print $rax" -ex
             | "quit"'` to get the syscall number, then check `systemd-
             | analyze syscall-filter` for whether I want to allow just
             | that one syscall or the whole group it's in.
        
               | growse wrote:
               | > The usual way I've seen (and do myself) is to just let
               | the process be killed and have its coredump taken, then
               | `coredumpctl gdb $process_name -A '-ex "print $rax" -ex
               | "quit"'` to get the syscall number, then check `systemd-
               | analyze syscall-filter` for whether I want to allow just
               | that one syscall or the whole group it's in.
               | 
               | Another approach would be to set SystemCallLog= to be the
               | opposite of SystemCallFilter= (negate each group with ~)
               | and then you'll see the call (and caller) in the journal.
        
         | d2wa wrote:
         | This is a getting started/101 introduction; it also talks about
         | and recommends systemd-analyze security. There's a link to part
         | two at the bottom of the article that goes deeper into things.
        
         | DyslexicAtheist wrote:
         | any system that starts security by blacklisting instead of
         | whitelisting tends to be doomed by upcoming changes.
        
       | egberts1 wrote:
       | Whose gonna write THE holy-grail of analyzer of many executables
       | to determine what Linux capabilities, cgroups, and syscalls are
       | just being referenced?
       | 
       | Caveat: it has to dig into ALL the linked libraries as well.
        
       | kenniskrag wrote:
       | if you want to test these settings I can recommend `sudo systemd-
       | run -p "DynamicUser=yes" -p "ProtectSystem=yes" -p
       | "ProtectHome=yes" --shell` but be in a readable directory like
       | /tmp or you receive an error.
        
         | 5e92cb50239222b wrote:
         | This is a very handy command in day-to-day work, actually. For
         | example, I use to limit the total amount of memory available to
         | an application, including page cache:                 $
         | systemd-run --user --scope --property=MemoryHigh=1G qbittorrent
         | 
         | It works just as you'd expect -- if qbittorrent's working set
         | goes above 1024 MiB, it pushes the least recently used page out
         | of the page cache. Doesn't really have any effects on upload or
         | download speeds, while helping to keep more useful data in
         | memory.
         | 
         | Many isolation flags are not available in `systemd-run --user`,
         | though, so if you'd like to have some protection you either
         | have to combine `sudo systemd-run` with `su -c`, or wrap the
         | command in firejail.
         | 
         | https://github.com/netblue30/firejail/
        
           | wmanley wrote:
           | I have a bash alias for `make` and `ninja` to do something
           | similar. Just having all the spawned processes in a cgroup
           | helps with system interactivity while building. This works
           | because the kernel will then schedule the whole build as a
           | single unit against the other work on the system, rather than
           | scheduling each process that the build spawns against every
           | other process that I'm running.
        
           | t0astbread wrote:
           | Interesting, a few months ago I tried using systemd-run to
           | implement unprivileged memory limits for a process and I'm
           | pretty sure it didn't work with the user manager. Is this a
           | recent addition? (I'm not sure what version of systemd I had
           | at the time.)
        
         | pram wrote:
         | Ooh, is this a good way to sandbox execs like ImageMagick or
         | stuff like that?
        
           | 5e92cb50239222b wrote:
           | Use firejail, it's a "one click" solution with prepackaged
           | profiles.
           | 
           | https://github.com/netblue30/firejail/
           | 
           | It uses the same kernel knobs as systemd does, but is more
           | user-friendly and has more features.
           | 
           | I use it for every application that handles data received
           | from other machines: books, images, documents, whatever.
        
             | YorickPeterse wrote:
             | You can also use Bubblewrap, but getting it up and running
             | requires a lot more fiddling around. For example, this is
             | what I use to isolate Zoom from the rest of my system: http
             | s://gitlab.com/yorickpeterse/dotfiles/-/blob/0a0492c78b6...
             | 
             | In my case I'm using Bubblewrap because Firejail was only
             | used for Zoom, and this felt a bit of a waste considering
             | Bubblewrap was already installed.
        
       | max002 wrote:
       | Great article :) thank you!
        
         | [deleted]
        
       | HowardStark wrote:
       | Is there any advice for working with older systemd versions?
       | Right off the bat, systemd 237 is out because there is no
       | security feature for that version of systemd-analyze.
        
         | 5e92cb50239222b wrote:
         | Use the same config you'd use for the latest systemd version.
         | It will ignore flags it doesn't know (and warn you in unit
         | logs).
        
       | bloopernova wrote:
       | Not meant to be a snarky comment, but a serious question: how
       | does this differ from SELinux?
        
         | cpuguy83 wrote:
         | They are completely different things, and where available
         | should be used together.
         | 
         | SELinux is a policy system where policy is enforced via labels.
         | 
         | Labels are applied to processes which classify what the process
         | is.
         | 
         | Labels are applied to files which define the what
         | classification of process can access the file.
         | 
         | The application of labels happens automatically based on
         | policy. Such policy would include the location of the file or
         | the label of the parent process.
         | 
         | As an example, the default policy for httpd would prevent httpd
         | from accessing /etc/passwd even though the process is running
         | as (or can be) the root user. I believe you could also do
         | interesting things like prevent httpd from opening a socket on
         | a non-standard port if you wanted to.
         | 
         | SELinux is very powerful but complicated. Ideally you use this
         | with distro packages which should have policies already
         | configured for you.
         | 
         | Critically it is not one vs the other. Use both if you have it.
        
         | tyingq wrote:
         | It seems to be using mostly the linux capabilities:
         | https://man7.org/linux/man-pages/man7/capabilities.7.html
         | 
         | So the overlap choice seems to be more around SELinux versus
         | Capabilities. Where SELinux is more fine-grained and tunable,
         | but more complicated also.
        
           | aseipp wrote:
           | It's not just Linux capabilities; on their own Linux
           | capabilities actually suck majorly and are very limited (AKA
           | "crapabilities"). But systemd also makes extensive usage of
           | cgroups and namespacing facilities to back it up e.g.
           | preventing runaway memory/CPU quotas and stopping
           | applications from accessing paths they shouldn't, restricting
           | network access, stuff like that. Some of this overlaps with
           | SELinux (e.g. restricting file access) but the mechanism is
           | fairly different.
           | 
           | The overlap/comparison between capabilities, systemds
           | features, and selinux features isn't really well defined in
           | any meaningful way IMO. It's really like 5 different features
           | being used in various ways.
        
             | PeterWhittaker wrote:
             | I'm curious what you mean by SELinux features not being
             | well-defined? While poorly documented, they are
             | extraordinarily precisely defined, allowing fine-grained
             | control of pretty much everything, all enforced by the
             | kernel with no workarounds, at least in enforcing mode.
        
         | staticassertion wrote:
         | It's vastly simpler, for one thing. SELinux is basically a
         | weird DSL/ programming language for describing system
         | interactions whereas systemd is providing a very basic
         | interface for common restrictions.
         | 
         | I would pretty much never ask a human being to write SELinux
         | policies unless that was explicitly part of their job whereas I
         | can pretty much point any developer to what systemd is
         | providing and they'll be able to work with it.
        
         | chasil wrote:
         | SELinux is designed as "mandatory access control," meaning that
         | it is not normally disabled.
         | 
         | The normal filesystem permissions of read/write/execute for
         | user/group/other are among those known as "discretionary access
         | controls," meaning that they can be relaxed.
         | 
         | The systemd unit security options are discretionary, at the
         | control of the administrator.
        
           | t0astbread wrote:
           | Is SELinux not also in the administrator's control?
        
         | candiddevmike wrote:
         | These days, systemd is better/easier to sandbox _services_ than
         | SELinux. SELinux/AppArmor is still the best way to protect
         | individual GUI and user apps (anything not ran from systemd
         | basically).
        
           | mbakke wrote:
           | I don't have much experience with SELinux, but at least in my
           | org the base policy is to run anything started interactively
           | by the user (or root) in _unconfined_t_ , i.e. with
           | protections disabled.
           | 
           | That is, the same command that gets denied by SELinux through
           | systemd will run fine (and unprotected) when started from a
           | shell.
           | 
           | Do you write your own policies for individual end-user
           | programs?
        
           | p_l wrote:
           | Easier, maybe. Better, nope. The breadth and detail available
           | just don't compare, and not in the way where systemd can even
           | touch the scope available to SElinux
        
             | candiddevmike wrote:
             | Can you expand on that? In my opinion, systemd has far more
             | controls for process security over SELinux (networking,
             | cgroups, nspawn sandboxing, etc).
        
               | p_l wrote:
               | Out of those, the only things that aren't covered by
               | SELinux are things that would be expected to be set by
               | wrapper/launcher process (modifying namespaces - which
               | covers nspawn and setting cgroups). Everything else, i.e.
               | actual run-time access decisions, is more fine grained
               | and controllable through SELinux, including level of
               | access control like whether a program can listen on a
               | socket or bind a socket, while still permitting it to
               | connect.
        
             | mst wrote:
             | SElinux is more capable in theory but so much less
             | usable/discoverable in practice that I suspect anybody who
             | isn't truly dedicated to doing SElinux right will end up
             | averaging better security via the systemd route.
             | 
             | (and I say this based on both observation and personal
             | experience, I have some stuff to harden later this year and
             | I'm really hoping I'll be able to involve somebody who
             | -has- that level of SElinux knowledge but plan B is almost
             | certainly going to be 'mst does his best with the unit
             | configs')
        
               | PeterWhittaker wrote:
               | As someone who does a fair amount of SELinux
               | professionally, I'd mostly agree with this: getting
               | started can be daunting, so one could likely get far more
               | value from a short time focusing on systemd security.
               | 
               | But if one can spare the time, SELinux can secure
               | everything, not just systemd services.
               | 
               | It all depends on the threat vectors one faces.
        
               | p_l wrote:
               | That's why I won't even try to suggest SELinux is
               | _easier_. It 's definitely easier to apply _some_
               | sandboxing through systemd, but it 's pretty coarse
               | grained and mostly seems to hit some relatively easy wins
               | involving capabilities dropping and stuff that is often
               | hidden deep inside PAM. Good start, but I wouldn't call
               | it "better" ultimately.
        
           | kaba0 wrote:
           | Why not use both? They are not complementary.
        
             | candiddevmike wrote:
             | Why would you use SELinux along with systemd? Systemd can
             | do filesystem permissions declaratively vs SELinux having
             | to label the files individually, e.g.:
             | 
             | [Service]
             | 
             | ProtectSystem=strict
             | 
             | ReadWritePaths=/some/path
             | 
             | ReadOnlyPaths=/some/otherpath
             | 
             | InaccessiblePaths=/etc
        
               | PeterWhittaker wrote:
               | One can write extraordinarily short FC files using regexp
               | to apply specific SELinux labels as desired, and control
               | access to those labels with only a few rules.
               | 
               | Unlike systemd, they then apply to everything.
        
       | loudtieblahblah wrote:
       | Does your box still touch local dns before connecting to VPN? No?
       | 
       | Then anything with systemd and security can stuff it.
        
         | getcrunk wrote:
         | What?
        
       ___________________________________________________________________
       (page generated 2022-01-19 23:00 UTC)