[HN Gopher] FuzzOS - an operating system which is designed speci... ___________________________________________________________________ FuzzOS - an operating system which is designed specifically for fuzzing Author : URfejk Score : 29 points Date : 2020-12-07 20:00 UTC (2 hours ago) (HTM) web link (gamozolabs.github.io) (TXT) w3m dump (gamozolabs.github.io) | codetrotter wrote: | I've been watching this guy stream on Twitch and I can tell you | that he is legit. | | Also his streams are often insanely long, going 7 to 13 hours. So | I only ever watch his streams live for a while and then I catch | the remainder on VOD. | | He also has a YouTube page with archive of past streams beyond | the retention of Twitch. | | https://www.youtube.com/channel/UC17ewSS9f2EnkCyMztCdoKA | | Honestly the knowledge he shares is so interesting that I | selfishly did not want other people to even know about it. But | realistically speaking I am not going to have time to make real | use of the knowledge myself anytime soon. | | He puts out quality content and he deserves all the attention he | can get. And also, even though competition to find security bugs | and earn bounties might become too hard that I myself or you ever | get to find one and claim some money, the products that we all | use will be more secure the more people in the world that work on | finding these bugs and reporting them. | albntomat0 wrote: | Do you have any thoughts on how to approach the videos? I have | a good OS and fuzzing background, but an 8 hour video seems | like an ordeal and harder to extract value from than something | written | codetrotter wrote: | I think my best advice would be that you tune in on one of | the streams on Twitch when it is live and ask him about it. | Then maybe the two of you can figure out what is most | relevant to you of his content compared to what you already | know? | yjftsjthsd-h wrote: | Huh. So my initial response was, "why on earth would you need a | whole OS for that", but memory snapshotting and improved virtual | memory performance might actually be a good justification. Linux | does have CRIU which might be made to work for such a purpose, | but I could see a reasonable person preferring to do it from a | clean slate. On the other hand, if you need qemu to run | applications (which I'm really unclear about; I can't tell if the | plan is to run stuff natively on this OS or just to provide | enough system to run qemu and then run apps on linux on qemu) | then I'm surprised that it's not easier to just make qemu do what | you want (again, I'm pretty sure qemu already has its own memory | snapshotting features to build on). | | Of course, writing an OS can be its own reward, too:) | gamozolabs wrote: | Oooh, wasn't really expecting this to make it to HN cause it | was meant to be more of an announcement than a description. | | But yes, I've done about 7 or 8 operating systems for fuzzing | in the past and it's a massive performance (and cleanliness) | cleanup. This one is going to be like an operating system I | wrote 2-3 years ago for my vectorized emulation work. | | To answer your QEMU questions, the goal is to effectively build | QEMU with MUSL (just to make it static so I don't need a | dynamic loader), and modify MUSL to turn all syscalls to `call` | instructions. This means a "syscall" is just a call to another | area, which will by my Rust Linux emulator. I'll implement the | bare minimum syscalls (and enum variants to those syscalls) to | get QEMU to work, nothing more. The goal is not to run Linux | applications, but run a QEMU+MUSL combination which may be | modified lightly if it means a lower emulation burden (eg. | getting rid of threading in QEMU [if possible] so we can avoid | fork()) | | The main point of this isn't performance, it's determinism, but | that is a side effect. A normal syscall instruction involves a | context switch to the kernel, potentially cr3 swaps depending | on CPU mitigation configuration, and the same to return back. | This can easily be hundreds of cycles. A `call` instruction to | something that handles the syscall is on the order of 1-4 | cycles. | | While for syscalls this isn't a huge deal, it's even more | emphasized when it comes to KVM hypercalls. Transitions to a | hypervisor are very expensive, and in this case, the kernel, | the hypervisor, and QEMU (eg. device emulation) will all be | running at the same privilege level and there won't be a weird | QEMU -> OS -> KVM -> other guest OS device -> KVM -> OS -> QEMU | transition every device interaction. | | But then again, it's mainly for determinism. By emulating Linux | deterministically (eg. not providing entropy through times or | other syscall returns), we can ensure that QEMU has no source | of external entropy, and thus, will always do the same thing. | Even if it uses a random-seeded hash table, the seed would be | derived from syscalls, and thus, will be the same every time. | This determinism means the guest always will do the same thing, | to the instruction. Interrupts happen on the same instructions, | context switches do, etc. This means any bug, regardless of how | complex, will reproduce every time. | | All of this syscall emulation + determinism I have also done | before, in a tool called tkofuzz that I wrote for Microsoft. | That used Linux emulation + Bochs, and it was written in | userspace. This has proven incredibly successful and it's what | most researchers are using at Microsoft now. That being said, | Bochs is about 100x slower than native execution, and now that | people have gotten a good hold of snapshot fuzzing (there's a | steep learning curve), it's time to get a more performant | implementation. With QEMU with get this with a JIT, which at | least gets us a 2-5x improvement over Bochs while still | "emulating", but even more value could be found if we get the | KVM emulation working and can use a hypervisior. That being | said, I do plan to support a "mode" where guests which do not | touch devices (or more specifically, snapshots which are taken | after device I/O has occurred) will be able to run without QEMU | at all. We're really only using QEMU for device emulation + | interrupt control, thus, if you take a snapshot to a function | that just parses everything in one thread, without process IPC | or device access (it's rare, when you "read" from a disk, | you're likely just hitting OS RAM caches, and thus not | devices), we can cut out all the "bloat" of QEMU and run in a | very very thin hypervisor instead. | | In fuzzing it's critical to have ways to quickly map and unmap | memory as most fuzz cases last for hundreds of microseconds. | This means after a few hundred microseconds, I want to restore | all memory back to the state "before I handled user input" and | continue again. This is extremely slow in every conventional | operating system, and there's really no way around it. It's of | course possible to make a driver or use CRIU, but these are | still not exactly the solution that is needed here. I'd rather | just make an OS that trivially runs in KVM/Hyper-V/Xen, and | thus can run in a VM to get the cross-platform support, rather | than writing a driver for every OS I plan to use this on. | | Stay cute, ~gamozo | AgloeDreams wrote: | Can someone tell me what the living heck is `Fuzzing`? | | I read this twice and I really don't have a single clue other | than it having something to do with or requiring fast memory? | lambda_obrien wrote: | Fuzzing: give a program structured random garbage as input and | see what happens, then fix the resulting bugs. | Forge36 wrote: | Originally: for each terminal program, pass every file as | input. If crash results: document it. | | Effectively: random inputs to achieve unexpected results. It's | now come to mean "random data testing of an API" | mehrdadn wrote: | Wikipedia explains it: https://en.wikipedia.org/wiki/Fuzzing | SAI_Peregrinus wrote: | Testing code via semi-random inputs[1]. The most common | fuzzers, AFL-Fuzz[2] and libFuzzer[3] are coverage-guided: they | compile the program with special instrumentation to determine | code coverage, then call the program repeatedly, changing the | inputs via genetic algorithm to try to maximize the code paths | executed. When unexpected behavior is observed (typically the | test harness crashing) the fuzzer saves the test's input for | future use. | | Basically automatic generation of test case inputs. It's non- | deterministic, so it won't always find problems, but it can | save a lot of manual effort. | | [1] https://en.wikipedia.org/wiki/Fuzzing [2] | https://lcamtuf.coredump.cx/afl/ [3] | https://www.llvm.org/docs/LibFuzzer.html | davidw wrote: | For an interesting, similar idea, see also: | | https://en.wikipedia.org/wiki/QuickCheck ___________________________________________________________________ (page generated 2020-12-07 23:00 UTC)