[HN Gopher] Attacking Firecracker: AWS' MicroVM Monitor Written ...
       ___________________________________________________________________
        
       Attacking Firecracker: AWS' MicroVM Monitor Written in Rust
        
       Author : pentestercrab
       Score  : 119 points
       Date   : 2022-09-08 16:20 UTC (6 hours ago)
        
 (HTM) web link (www.graplsecurity.com)
 (TXT) w3m dump (www.graplsecurity.com)
        
       | Dunedan wrote:
       | tl;dr: The article describes the details of Firecrackers
       | architecture and CVE-2019-18960, which (as you can imagine) got
       | fixed long ago.
        
       | Dunedan wrote:
       | > Firecracker is comparable to QEMU; they are both VMMs that
       | utilize KVM, a hypervisor built into the Linux kernel.
       | 
       | That's not accurate: While KVM is mandatory for Firecracker, it
       | isn't for QEMU.
        
         | TheDong wrote:
         | It is accurate with a charitable reading, and not accurate with
         | an uncharitable one.
         | 
         | > Firecracker is comparable to QEMU
         | 
         | This is charitably true. They can be used for similar high-
         | level tasks, and thus they are reasonably comparable.
         | 
         | > that utilize KVM
         | 
         | Both qemu and firecracker use KVM, it happens that qemu
         | supports other hypervisors/options.
         | 
         | "utilize KVM" does not have to be read uncharitably as _only_
         | uses that.
        
           | bogwog wrote:
           | The internet would be a friendlier place if people were more
           | willing to give each other the benefit of the doubt.
        
       | pcwalton wrote:
       | The fact that this doesn't seem exploitable shows the value of
       | defense in depth: although numerous safety measures were
       | defeated, exploitation was ultimately blocked by a guard page. If
       | that guard page hadn't been there, the outcome could have been
       | very bad. Still, it got closer to exploitable than anyone is
       | comfortable with.
        
         | staticassertion wrote:
         | Definitely. It could have very easily gone the other way -
         | AFAIK that guard page was not to defend against this sort of
         | issue. What's great is that Firecracker _now_ does have
         | explicit guard pages that they allocate in response to this,
         | which to me indicates that they 're not just a project that
         | patches a vulnerability but thinks through how to protect
         | against classes of vulnerability.
        
           | tptacek wrote:
           | They do all sorts of things for security, which is one of the
           | "tenets" (an Amazon thing) of the project. For instance: the
           | reason we haven't had easy access to GPUs is that they don't
           | fit easily into the Firecracker architecture.
        
       | fulafel wrote:
       | > Currently, io_uring system calls are included in Firecracker's
       | seccomp filter. Because it redefines how system calls are
       | executed, io_uring offers a seccomp bypass for the supported
       | system calls. This is because seccomp filtering occurs on system
       | call entry after a thread context switch, but system calls
       | executed via io_uring do not go through the normal system call
       | entry. Therefore, Firecracker's seccomp policy should be treated
       | as its union with all system calls supported by io_uring.
       | 
       | ...
       | 
       | > Because of the nature of system call filtering via seccomp,
       | io_uring still presents a major security disruption in
       | sandboxing.
       | 
       | This is pretty interesting as io_uring has been seen a lot of
       | press as the hot new thing.
        
         | raggi wrote:
         | I'd love to see a move away from bpf hooks for security and
         | ossify more of the key things as formal userspace API.
        
         | twunde wrote:
         | The author of this also did a writeup of an io_uring exploit a
         | while back that you might find interesting
         | https://www.graplsecurity.com/post/iou-ring-exploiting-the-l...
         | (the September 8 date is definitely wrong, probably an artifact
         | from moving blogging platforms)
        
         | wyager wrote:
         | Can't the kernel executor just check the seccomp rules when it
         | pulls tasks off the iou queue?
        
           | staticassertion wrote:
           | Fundamentally, yeah sure, it can do whatever it wants. It
           | just doesn't right now.
        
       | tptacek wrote:
       | This is a pretty good writeup of a long-fixed Firecracker bug
       | (CVE-2019-18960).
       | 
       | Firecracker is a KVM hypervisor, and so a Firecracker VM is a
       | Linux process (running Firecracker). The guest OS sees "physical
       | memory", but that memory is, of course, just mapped pages in the
       | Firecracker process (the "host").
       | 
       | Modern KVM guests talk to their hosts with virtio, which is a
       | common abstraction for a bunch of different device types that
       | consists of queues of shared buffers. Virtio queues are used for
       | network devices, block devices, and, apropos this bug, for
       | vsocks, which are a sort of generic host-guest socket interface
       | (vsock : host/guest :: netlink : user/kernel, except that Netlink
       | is much better specified, and people just do sort of random stuff
       | with vsocks. They're handy.)
       | 
       | The basic deal with managing virtio vsock messages is that the
       | guest is going to fill in and queue buffers on its side expecting
       | the host to read from them, which means that when the host
       | receives them, it needs to dereference pointers into guest
       | memory. Which is not that big of a deal; this is, like, some of
       | the basic functioning of a hypervisor. A running guest has a
       | "regions" of physical memory that correspond to mapped pages in
       | Firecracker on the host side; Firecracker just needs to keep
       | tables of regions and their corresponding (host userland) memory
       | ranges.
       | 
       | This table is usually pretty simple; it's 1 entry long if the VM
       | has less than 3.5G, and 2 entries if more. Unless you're on ARM,
       | in which case it's always 1 entry, and the bug wasn't
       | exploitable.
       | 
       | The only tricky problem here for Firecracker is that we can't
       | trust the guest --- that's the premise of a hypervisor! --- and a
       | guest can try to create fucky messages with pointers into invalid
       | memory, hoping that they'll correspond to invalid memory ranges
       | in the host that Firecracker will deference. And, indeed, in
       | 2019, there was a case where that would happen: if you sent a
       | vsock message, which is a tuple (header, base, size), where:
       | 
       | 1. The guest had more than 3.5G of memory, so that Firecracker
       | would have more than one region table entry
       | 
       | 2. The base address landed in some valid entry in the table of
       | regions
       | 
       | 3. base+size lands in some other valid entry in the table of
       | regions
       | 
       | There are two bugs: first, a validity check on virtio buffers
       | doesn't check to make sure that _both_ base _and_ base+size are
       | in the same, valid region, and second, code that extracts the
       | virtio vsock message does an address check on the buffer address
       | with a size of 1 (in other words, just checking to see if the
       | base address is valid, without respect to the size).
       | 
       | At any rate, because the memory handling code here deals with raw
       | pointers, this was done in Rust `unsafe{}` blocks, and so this
       | bug combination would theoretically let a guest trick Firecracker
       | into writing into host memory outside of a valid guest memory
       | range.
       | 
       | The hitch, which is as far as I know fatal: there's nothing
       | mapped in between regions in x86 Firecracker that you can write
       | to: between a memory region and the no-mans-land memory region
       | outside it, there always happen to be PROT_NONE guard pages+, so
       | an overwrite will simply kill the Firecracker process. Since the
       | attacker here already controls the guest kernel, crashing the
       | guest this way doesn't win you anything you didn't already have.
       | 
       | + _And now, post-fix, there 's deliberately PROT_NONE guard pages
       | around regions_
        
       | MariuszGalus wrote:
       | I was expecting a demo of an exploit, but what I got was code
       | analysis and verbal handwaving. Anyone else feel like something
       | was missing here?
       | 
       | Edit, I did learn cool new stuff tho, thanks.
        
         | kibwen wrote:
         | It looks like the author wasn't able to pull all the gadgets
         | together into a working exploit, after finally being stymied by
         | the fact that Rust surrounds the stack with guard pages (which
         | are intended to catch accidental stack overflow, but
         | fortuitously appear to also provide some protection against
         | deliberate exploits as well). But it could have easily gone the
         | other way, and exploits there might be still be possible
         | (though obviously the code in question is many years out of
         | date by now). It still serves to demonstrate the importance of
         | auditing your unsafe blocks, the value of unsafe blocks in the
         | first place (which is, I suspect, how this exploit was
         | discovered in the first place), the value of additional tools
         | to verify unsafe code (e.g. Miri, Kani), and the reason why
         | Rust still goes to all the trouble of implementing runtime
         | mitigations despite its memory safety guarantees.
        
         | chompie wrote:
         | Hi, author here.
         | 
         | I walk through the process of developing the exploit and
         | primitives, and was upfront that I ran into a mitigation which
         | thwarted my exploit strategy. Similar to other exploit writeups
         | I've done, I try to focus on the big picture and illustrate the
         | idea (through writing and diagrams) while still being
         | technically rigorous. Exploit development is much more reading
         | code than it is writing it.
         | 
         | If you have any suggestions for improvement, or want to tell me
         | which sections felt like handwaving to you, please let me know!
         | Better yet, if you have an idea on how to defeat the mitigation
         | so I can complete the exploit, I would love to discuss it.
         | 
         | BTW: Failing to produce an exploit for a very powerful bug like
         | this, despite my best efforts, was considered a giant win for
         | the security review of Firecracker.
        
         | staticassertion wrote:
         | Well, we attacked Firecracker and this is what we got haha not
         | every attack is going to lead to a full end to end, reliable
         | exploit, although we've posted those in the past too.
         | 
         | The key here wasn't to produce an exploit. That would have been
         | interesting, but ultimately not the entire goal. The key was to
         | understand "how do we use Firecracker in the safest possible
         | way for our use case?". To do that we picked one of the CVEs
         | that looked like it could be exploitable and dug into it.
         | 
         | We learned a ton about Firecracker and KVM and walked away with
         | some mitigations we can implement such that even if the bug
         | _had_ been exploitable the attacker would have more hurdles to
         | jump through. Specifically, we 'll be working to harden the
         | guest operating system such that the untrusted code will have a
         | difficult time escalating to root/kernel, which is a
         | prerequisite for this sort of attack.
        
         | [deleted]
        
       ___________________________________________________________________
       (page generated 2022-09-08 23:00 UTC)