[HN Gopher] Attacking Firecracker: AWS' MicroVM Monitor Written ... ___________________________________________________________________ Attacking Firecracker: AWS' MicroVM Monitor Written in Rust Author : pentestercrab Score : 119 points Date : 2022-09-08 16:20 UTC (6 hours ago) (HTM) web link (www.graplsecurity.com) (TXT) w3m dump (www.graplsecurity.com) | Dunedan wrote: | tl;dr: The article describes the details of Firecrackers | architecture and CVE-2019-18960, which (as you can imagine) got | fixed long ago. | Dunedan wrote: | > Firecracker is comparable to QEMU; they are both VMMs that | utilize KVM, a hypervisor built into the Linux kernel. | | That's not accurate: While KVM is mandatory for Firecracker, it | isn't for QEMU. | TheDong wrote: | It is accurate with a charitable reading, and not accurate with | an uncharitable one. | | > Firecracker is comparable to QEMU | | This is charitably true. They can be used for similar high- | level tasks, and thus they are reasonably comparable. | | > that utilize KVM | | Both qemu and firecracker use KVM, it happens that qemu | supports other hypervisors/options. | | "utilize KVM" does not have to be read uncharitably as _only_ | uses that. | bogwog wrote: | The internet would be a friendlier place if people were more | willing to give each other the benefit of the doubt. | pcwalton wrote: | The fact that this doesn't seem exploitable shows the value of | defense in depth: although numerous safety measures were | defeated, exploitation was ultimately blocked by a guard page. If | that guard page hadn't been there, the outcome could have been | very bad. Still, it got closer to exploitable than anyone is | comfortable with. | staticassertion wrote: | Definitely. It could have very easily gone the other way - | AFAIK that guard page was not to defend against this sort of | issue. What's great is that Firecracker _now_ does have | explicit guard pages that they allocate in response to this, | which to me indicates that they 're not just a project that | patches a vulnerability but thinks through how to protect | against classes of vulnerability. | tptacek wrote: | They do all sorts of things for security, which is one of the | "tenets" (an Amazon thing) of the project. For instance: the | reason we haven't had easy access to GPUs is that they don't | fit easily into the Firecracker architecture. | fulafel wrote: | > Currently, io_uring system calls are included in Firecracker's | seccomp filter. Because it redefines how system calls are | executed, io_uring offers a seccomp bypass for the supported | system calls. This is because seccomp filtering occurs on system | call entry after a thread context switch, but system calls | executed via io_uring do not go through the normal system call | entry. Therefore, Firecracker's seccomp policy should be treated | as its union with all system calls supported by io_uring. | | ... | | > Because of the nature of system call filtering via seccomp, | io_uring still presents a major security disruption in | sandboxing. | | This is pretty interesting as io_uring has been seen a lot of | press as the hot new thing. | raggi wrote: | I'd love to see a move away from bpf hooks for security and | ossify more of the key things as formal userspace API. | twunde wrote: | The author of this also did a writeup of an io_uring exploit a | while back that you might find interesting | https://www.graplsecurity.com/post/iou-ring-exploiting-the-l... | (the September 8 date is definitely wrong, probably an artifact | from moving blogging platforms) | wyager wrote: | Can't the kernel executor just check the seccomp rules when it | pulls tasks off the iou queue? | staticassertion wrote: | Fundamentally, yeah sure, it can do whatever it wants. It | just doesn't right now. | tptacek wrote: | This is a pretty good writeup of a long-fixed Firecracker bug | (CVE-2019-18960). | | Firecracker is a KVM hypervisor, and so a Firecracker VM is a | Linux process (running Firecracker). The guest OS sees "physical | memory", but that memory is, of course, just mapped pages in the | Firecracker process (the "host"). | | Modern KVM guests talk to their hosts with virtio, which is a | common abstraction for a bunch of different device types that | consists of queues of shared buffers. Virtio queues are used for | network devices, block devices, and, apropos this bug, for | vsocks, which are a sort of generic host-guest socket interface | (vsock : host/guest :: netlink : user/kernel, except that Netlink | is much better specified, and people just do sort of random stuff | with vsocks. They're handy.) | | The basic deal with managing virtio vsock messages is that the | guest is going to fill in and queue buffers on its side expecting | the host to read from them, which means that when the host | receives them, it needs to dereference pointers into guest | memory. Which is not that big of a deal; this is, like, some of | the basic functioning of a hypervisor. A running guest has a | "regions" of physical memory that correspond to mapped pages in | Firecracker on the host side; Firecracker just needs to keep | tables of regions and their corresponding (host userland) memory | ranges. | | This table is usually pretty simple; it's 1 entry long if the VM | has less than 3.5G, and 2 entries if more. Unless you're on ARM, | in which case it's always 1 entry, and the bug wasn't | exploitable. | | The only tricky problem here for Firecracker is that we can't | trust the guest --- that's the premise of a hypervisor! --- and a | guest can try to create fucky messages with pointers into invalid | memory, hoping that they'll correspond to invalid memory ranges | in the host that Firecracker will deference. And, indeed, in | 2019, there was a case where that would happen: if you sent a | vsock message, which is a tuple (header, base, size), where: | | 1. The guest had more than 3.5G of memory, so that Firecracker | would have more than one region table entry | | 2. The base address landed in some valid entry in the table of | regions | | 3. base+size lands in some other valid entry in the table of | regions | | There are two bugs: first, a validity check on virtio buffers | doesn't check to make sure that _both_ base _and_ base+size are | in the same, valid region, and second, code that extracts the | virtio vsock message does an address check on the buffer address | with a size of 1 (in other words, just checking to see if the | base address is valid, without respect to the size). | | At any rate, because the memory handling code here deals with raw | pointers, this was done in Rust `unsafe{}` blocks, and so this | bug combination would theoretically let a guest trick Firecracker | into writing into host memory outside of a valid guest memory | range. | | The hitch, which is as far as I know fatal: there's nothing | mapped in between regions in x86 Firecracker that you can write | to: between a memory region and the no-mans-land memory region | outside it, there always happen to be PROT_NONE guard pages+, so | an overwrite will simply kill the Firecracker process. Since the | attacker here already controls the guest kernel, crashing the | guest this way doesn't win you anything you didn't already have. | | + _And now, post-fix, there 's deliberately PROT_NONE guard pages | around regions_ | MariuszGalus wrote: | I was expecting a demo of an exploit, but what I got was code | analysis and verbal handwaving. Anyone else feel like something | was missing here? | | Edit, I did learn cool new stuff tho, thanks. | kibwen wrote: | It looks like the author wasn't able to pull all the gadgets | together into a working exploit, after finally being stymied by | the fact that Rust surrounds the stack with guard pages (which | are intended to catch accidental stack overflow, but | fortuitously appear to also provide some protection against | deliberate exploits as well). But it could have easily gone the | other way, and exploits there might be still be possible | (though obviously the code in question is many years out of | date by now). It still serves to demonstrate the importance of | auditing your unsafe blocks, the value of unsafe blocks in the | first place (which is, I suspect, how this exploit was | discovered in the first place), the value of additional tools | to verify unsafe code (e.g. Miri, Kani), and the reason why | Rust still goes to all the trouble of implementing runtime | mitigations despite its memory safety guarantees. | chompie wrote: | Hi, author here. | | I walk through the process of developing the exploit and | primitives, and was upfront that I ran into a mitigation which | thwarted my exploit strategy. Similar to other exploit writeups | I've done, I try to focus on the big picture and illustrate the | idea (through writing and diagrams) while still being | technically rigorous. Exploit development is much more reading | code than it is writing it. | | If you have any suggestions for improvement, or want to tell me | which sections felt like handwaving to you, please let me know! | Better yet, if you have an idea on how to defeat the mitigation | so I can complete the exploit, I would love to discuss it. | | BTW: Failing to produce an exploit for a very powerful bug like | this, despite my best efforts, was considered a giant win for | the security review of Firecracker. | staticassertion wrote: | Well, we attacked Firecracker and this is what we got haha not | every attack is going to lead to a full end to end, reliable | exploit, although we've posted those in the past too. | | The key here wasn't to produce an exploit. That would have been | interesting, but ultimately not the entire goal. The key was to | understand "how do we use Firecracker in the safest possible | way for our use case?". To do that we picked one of the CVEs | that looked like it could be exploitable and dug into it. | | We learned a ton about Firecracker and KVM and walked away with | some mitigations we can implement such that even if the bug | _had_ been exploitable the attacker would have more hurdles to | jump through. Specifically, we 'll be working to harden the | guest operating system such that the untrusted code will have a | difficult time escalating to root/kernel, which is a | prerequisite for this sort of attack. | [deleted] ___________________________________________________________________ (page generated 2022-09-08 23:00 UTC)