[HN Gopher] Put an io_uring on it: Exploiting the Linux kernel ___________________________________________________________________ Put an io_uring on it: Exploiting the Linux kernel Author : blopeur Score : 80 points Date : 2022-03-08 19:35 UTC (3 hours ago) (HTM) web link (www.graplsecurity.com) (TXT) w3m dump (www.graplsecurity.com) | BigComrade wrote: | tptacek wrote: | This is one of the all-time great LPE writeups. | | A summary: | | 1. io_uring includes a feature that asks the kernel to manage | groups of buffers for SQEs (the objects userland submits to tell | uring what to do). If you enable this feature, the kernel | overloads a field normally used to track a userland pointer with | a kernel pointer. | | 2. The special-case code that handles I/O operations for files- | that-are-not-files, like in procfs, missed the check for this | "overloaded pointer" hack, and so can be tricked into advancing a | kernel pointer arbitrarily, because it thinks it's working with a | userland pointer. | | 3. The pointer you manipulate thusly is eventually freed, which | lets you free kernel objects within a range of possible pointers. | | 4. io_uring allows you to control the CPU affinity of the kernel | threads it generates on your behalf, because of course it does, | so you can get your userland process and all your related | io_uring kthreads onto the same CPU, and thus into the same SLUB | cache area, which gives you enough control to target specific | kernel objects (of a size bounded I think by the SQE?) reliably. | | 5. There's a well-known LPE trick for exploiting UAFs: the | setxattr(2) syscall copies arbitrary extended attributes for | files from userland to kernel buffers (that's its job), and the | userfaultfd(2) syscall lets you defer page faults to userland; | you can chain setxattr and userfaultfd to allocate and populate a | kernel buffer of arbitrary size and contents and then block, | keeping the object in memory. | | 6. Since that's a popular exploit technique, there's a default- | yes setting in most distros to require root to use userfaultfd(2) | --- but you can do the same thing with FUSE, where deferring I/O | operations to userland is kind of the whole premise of the | interface. | | 7. setxattr/userfaultfd can be transformed from a UAF primitive | to an arbitrary kernel leak: if you have an arbitrary-free | vulnerability (see step 3), you can do the setxattr-then-block | thing, then trigger the free from another thread and target the | xattr buffer, so setxattr's buffer is reclaimed out from under | it, then trigger the allocation of a kernel structure you want to | leak that is of the same size, which setxattr will copy into | (another UAF); now you have a kernel structure that the kernel is | treating like a file's extended attributes, which you can read | back with getxattr. Neat! | | 8. At this point you can go hunting for kernel structures to | whack, because you can use the arbitrary leak primitive to leak | structs that in turn embed the (secret) addresses of other kernel | structures. | | 9. Find a pointer to a socket's BPF filter and use the UAF to | inject a BPF filter directly, bypassing the verifier, then | trigger the BPF filter and do whatever you want, I guess. | | I'm sure I got a bunch of this wrong; corrections welcome. Again: | really spectacular writeup: a good bug, some neat tricks, and a | decent survey of Linux kernel LPE techniques. | junon wrote: | Yes, unfortunately I figured this might happen. People have been | warning of some major issues with its design for a while now wrt | security. Paired with the fact it's not much faster in practice | than epoll in a large majority of usecases, I really worry it's | going to footgun some people. | FridgeSeal wrote: | I'm confused by this, isn't one of the mains points of uring is | that it's faster? | frevib wrote: | For disk IO it's faster, there are many benchmarks on the | internet. | | For network IO, it depends. Only two things make it | theoretically faster than epoll; io_uring supports batching of | requests, and you can save one sys call compared to epoll in an | event loop. There some other things that could make it faster | like SQPOLL, but this could also hurt performance. | | Network IO discussion: | https://github.com/axboe/liburing/issues/536 | dralley wrote: | > Paired with the fact it's not much faster in practice than | epoll in a large majority of usecases, I really worry it's | going to footgun some people. | | "it's not faster than epoll" is somewhat dependent on your | hardware and kernel. For one thing, Jens Axobe has been working | on a lot of io-uring optimizations lately, but you probably | won't see them unless you're using a kernel from the last few | months. And by "a lot" I really mean 3x to 4x faster in the | last year on the benchmarks he has been using. | | So if all your comparisons are on an enterprisey linux distro, | you probably aren't getting a complete picture of epoll vs io- | uring performance. epoll has been around a while, it's had more | hours poured into optimization and probably regresses less | frequently. | egberts1 wrote: | Whoa! | | One frickin' GIANT driver coherency setting, I/O Ring, that is. ___________________________________________________________________ (page generated 2022-03-08 23:00 UTC)