[HN Gopher] QEMU Internals
       ___________________________________________________________________
        
       QEMU Internals
        
       Author : Nusyne
       Score  : 258 points
       Date   : 2021-04-26 12:21 UTC (10 hours ago)
        
 (HTM) web link (airbus-seclab.github.io)
 (TXT) w3m dump (airbus-seclab.github.io)
        
       | pthreads wrote:
       | Thank you.
       | 
       | On the same subject can someone recommend a book or any other
       | resource to learn about virtual machine internals? My goal is to
       | try to build a toy clone of VirtualBox/VMWare.
       | 
       | So far I have found one -- Virtual Machines by James E. Smith and
       | Ravi Nair.
        
         | ahefner wrote:
         | "KVM host in a few lines of code"
         | (https://zserge.com/posts/kvm/) is a fun article to get started
         | with.
        
         | tkhattra wrote:
         | Hardware and Software Support for Virtualization Synthesis
         | Lectures on Computer Architecture (2017)
         | 
         | https://www.morganclaypool.com/doi/abs/10.2200/S00754ED1V01Y...
         | 
         | Bringing Virtualization to the x86 Architecture with the
         | Original VMware Workstation (2012)
         | 
         | https://dl.acm.org/doi/abs/10.1145/2382553.2382554
        
         | hag wrote:
         | I've always been intrigued by virtual machines and emulation as
         | well. I've always wanted to try and make an emulator of some
         | kind. I don't know much about the internals of VirtualBox, but
         | my suggestion would be to start "easy" with one CPU/Computer
         | System/Game Console and go from there. That's what I finally
         | did with the 6502 and Commodore 64.
        
           | pizza234 wrote:
           | Conventionally, one starts from the CHIP-8, which is indeed a
           | virtual machine rather than a system in a strict sense.
           | 
           | What I've found difficult is the step beyond that. NES and
           | GameBoy are typical steps, however, I've been very frustrated
           | by the confusing documentation of the GameBoy. There are 3/4
           | references, but one of them has significant mistakes, while
           | another is incomplete. On the other hand, the Pan Docs should
           | be complete and accurate.
           | 
           | I'm not sure if there is an easy middle ground, that, at the
           | same time, is also well documented.
           | 
           | The Atary 2600 is architecturally simpler but less
           | documented, and also requires very accurate timings. I've
           | read somebody suggesting systems like Channel F, Astrocade
           | and Odyssey2, but I'm not sure they're well documented.
           | 
           | I've personally lost my interest once I've found that
           | building an emulator was essentially fighting specifications
           | rather than actually building something.
        
             | toast0 wrote:
             | I built about a third of a NES emulator. The nesdev wiki is
             | mostly decent, although there's a fair number of things
             | where it seems like the first people to figure things out
             | got stuff kind of backwards, and if you flip it, it's a lot
             | easier, that's the sort of fighting the specifications I
             | think you're talking about.
             | 
             | All that said, emulating the CPU was pretty fun. There's a
             | CPU test rom out there you can run with tracing and compare
             | to the published results. I also got the background tiling
             | from the PPU done, but the foreground processing has a lot
             | of steps, so I indefinitely paused for now. Also, I had
             | amazingly poor performance, so I wasn't super motivated to
             | continue.
             | 
             | The 2600 has a very similar cpu, but the very limited
             | Stella output chip means most games are very timing
             | dependent, which means you have to be super accurate, which
             | adds difficulty. I think you should try to be cycle
             | accurate anyway, but it's easy to mess that up, and having
             | some freedom would be nice.
        
               | bambataa wrote:
               | I did a GameBoy and similarly found the CPU enjoyable and
               | the PPU a huge pain. Perhaps if I understood graphics
               | better, I would have enjoyed it more, but like you say it
               | just felt like a lot of steps.
        
             | andrewf wrote:
             | A subset of CP/M calls is a pretty simple "rest of the
             | system" to implement on top of an 8080/Z80 CPU emulation.
             | (It's a bit of a cheat - like qemu's "Linux user mode
             | emulation" or early version of DOSBox, because you restrict
             | software to interacting with a high-level software
             | interface, there are no lower-level details to aim for
             | fidelity with)
        
         | teleforce wrote:
         | The sibling's comment book recommendation "Hardware and
         | Software Support for Virtualization" book is on point and it's
         | written by one of the co-founders of VMware.
         | 
         | Another book on Libvirt will be handy since it is the de facto
         | API for most virtualization including VMs and containers[1].
         | 
         | [1]https://www.amazon.com/Foundations-Libvirt-Development-
         | Maint...
        
         | alert0 wrote:
         | Fuzz week shows how to make make a snapshot / resettable
         | jitting hypervisor.
         | 
         | https://m.youtube.com/playlist?list=PLSkhUfcCXvqHsOy2VUxuoAf...
        
         | vitno wrote:
         | I work on virtual machines at Google. I usually suggest
         | "Hardware and Software Support for Virtualization" [1] to new
         | team members without a virtualization background.
         | 
         | [1] https://www.amazon.com/Hardware-Software-Virtualization-
         | Synt...
        
         | [deleted]
        
         | DarmokJalad1701 wrote:
         | For a really simple emulator project (not quite the level of
         | VirtualBox), check the "IntCode" challenges from AdventOfCode
         | 2019.
        
           | sammorrowdrums wrote:
           | Those were so fun! I loved my little VM as it progressed and
           | played pong, and commanded robots and rendered the output
           | etc.
           | 
           | It's a really great fun way to learn the key concepts.
        
       | junon wrote:
       | This is very well organized, wow.
        
       | whoisburbansky wrote:
       | I don't mean this to disparage Airbus in any way but after
       | Boeing's issues with the 737 MAX I'd assumed a fairly poor
       | culture of software at airplane manufacturers in general. Super
       | glad to see work like this coming out of Airbus, really makes me
       | rethink my earlier assumptions about software competence in the
       | field.
        
         | Glawen wrote:
         | Is "move fast and break things" a good culture for airplane
         | manufacturer? Airbus is known for making good software, they
         | earned their reputation by releasing the first fly by wire
         | airliner (a320) in 84, which forced Boeing to go this route
         | with the 777.
         | 
         | Making safety critical software is a totally different world
         | than what is seen on HN. The culture needed is safety culture
         | and it is all about doing boring code, following strict coding
         | rules, doing tons of documentation and analysis prior coding
         | and a doing tons of review of tests. I don't think it will
         | arouse interest here.
        
         | Veserv wrote:
         | That is such a bizarre viewpoint from my perspective. The
         | absolute deathtrap that is the 737 MAX had two software-related
         | critical failures in 400,000 flights. That constitutes a whole
         | system per-flight software reliability of 2 in ~400,000 or a
         | ~99.9995%, 5 9s. Obviously that is still unacceptable as that
         | is far below the software standard amongst all commercial
         | airplanes where software has not been implicated in a crash for
         | at least the last 10 years except for the 737 MAX. Even if we
         | include the two 737 MAX crashes into the statistics, the whole
         | system per-flight software reliability of all commercial
         | airplanes over the last decade is at least 2 in ~100,000,000 or
         | ~99.999998% or 7 9s. The standard in airplane software is
         | literally 5000x more reliable than AWS SLA guarantees and 500x
         | the holy grail in server software of 5 9s. Even the 737 MAX is
         | 20x better than the AWS guarantee and 2x more reliable than 5
         | 9s. Airplane software is not bad, we just rightfully expect a
         | lot from systems that lives depend on, so even systems that are
         | better than best-in-class non-safety software are completely
         | unacceptable which may give the impression that they are bad in
         | absolute terms as they fail to live up to our expectations.
        
           | zaphirplane wrote:
           | That's an interesting way to look at uptime no pun intended
           | 
           | thou I wouldn't buy a Toyota that exploded every 400,000
           | trips world wide Or bank with a bank that lost all my money
           | every 400,000 transactions world wide
        
             | Glawen wrote:
             | Well, Toyota had the sticking gas pedal issue 10 years ago:
             | they did not implement a brake override when the gas pedal
             | was stuck. This was a recommended feature by European
             | manufacturers when they introduced the electronic throttle,
             | apparently Toyota didn't get the memo.
             | 
             | Although I find the GM ignition key issue way worse than
             | Toyota which was an oversight.
        
             | Veserv wrote:
             | Indeed, a Toyota with a critical fatality-inducing safety
             | defect every 200,000 trips would be rightfully viewed as a
             | deathtrap. Given that the average trip is probably
             | somewhere around ~30 miles that would be a fatality per 6M
             | miles versus the standard of ~60M miles in the US, or about
             | 10x more dangerous. However, when comparing a car versus
             | airplanes, given that they both fulfill the niche of
             | transportation and are to some degree substitutable, a more
             | reasonable analysis would be fatalities/person-hour or
             | fatalities/person-mile. For fatalities/person-hour the
             | average flight is something like ~2 hours. In the same
             | amount of time 200,000 cars for 2 hours at an average of 40
             | mph would be ~16M miles, so the 737 MAX is ~4x more
             | dangerous on a person-hour basis than cars. If we go by
             | distance the average flight is ~500 miles, so the 737 MAX
             | had a fatality per 100M person-miles or is ~1.6x _safer_
             | than driving. That is just how high our standards are with
             | planes that a plane that is viewed as an absolute death
             | machine that is totally unfit for use is safer than its
             | primary alternative for an equivalent distance. A plane
             | that is 100x worse than any other commercial plane is still
             | better than the non-plane alternative on a per-distance
             | basis.
             | 
             | Obviously, this does not excuse their actions as they still
             | made a system at least 100x more dangerous than the
             | standard, but it should give perspective on the difficulty
             | of the problems actually being solved. It is not a bunch of
             | amateurs or below-average engineers who need to adopt basic
             | practices. It is a bunch of highly-skilled professionals
             | developing systems with a level of reliability far beyond
             | what most software developers even think is possible. Even
             | the abysmal processes of the 737 MAX that are far below the
             | standard in the airplane industry would, relative to most
             | software, be very good. It is just that the problems they
             | need to solve are very, very, very hard and very good does
             | not cut it when lives, not data, are at stake.
        
           | elteto wrote:
           | Apples to oranges? The scale between AWS and 737s is several
           | orders of magnitude different. Boeing has a critical issue
           | every 200k flights, or let's say 3.8M hours of flight time
           | (assuming all flights are 19h, which they are not). Assume
           | AWS has 1M CPUs total (they have way more than that), if AWS
           | saw a critical CPU bug every 3.8M hours of CPU time they
           | would be having a 737 MAX crisis level every 3.8 hours.
        
             | Veserv wrote:
             | One failure per 3.8M hours would be once per 433 CPU-years,
             | so they probably actually do have somewhere between 10-100x
             | that failure rate for their CPUs given that expected CPU
             | lifetime is probably around 20-30 years. Even using a much
             | more reasonable 2 hours per flight that is still ~45 CPU-
             | years so still within the likely range of expected CPU
             | errors. Also that is a comparison against a system so
             | dangerous that it is unfit for use instead of the actual
             | standard which is once per 50,000,000 flights or ~250x
             | better.
             | 
             | Even ignoring that, I am discussing the uptime of a system
             | using AWS which only guarantees 99.99% uptime for AWS
             | service in any given AWS region and only a 10% refund
             | (which is less than their profit margin) as long as they
             | keep your system up more than 99% of the time. Downtime for
             | a system due to AWS downtime in a region constitutes a
             | critical failure of AWS to deliver expected service. That
             | their lack of service does not result in deaths unlike an
             | airplane is immaterial to a reliability analysis, it only
             | tells us if their critical failures matter and what level
             | of reliability we should require/demand when making
             | reliability-cost tradeoffs. In other words, the probability
             | and costs of failure are not actually related. It is just
             | that costly failures result in more effort being spent on
             | developing mitigations. In the case of airplanes, critical
             | failure in the form of a crash is very costly, so they take
             | great pains to minimize the whole-system risk of that
             | failure mode.
        
         | pjerem wrote:
         | Airbus is known to be excellent in airplane software
         | development.
         | 
         | However, this is probably not about the airplane part of
         | Airbus. Like Boeing, Airbus also have huge defense and space
         | divisions.
        
         | hhh wrote:
         | Airbus also has the Airbus Defense and Space group as well,
         | it's not just all airplanes :)
        
           | [deleted]
        
       ___________________________________________________________________
       (page generated 2021-04-26 23:01 UTC)