This article put up with the permission of my friend Girish Venkatachalam. He has written quite a few articles including some for the Linux Journal. Below are some links to his articles: http://wiki.openbsd.org http://linuxjournal.com/9004 (Cover story of Aug 2006) http://linuxjournal.com/8289 (first international article) http://www.openbsd-wiki.org/index.php?title=Multiboot http://www.ciol.com/content/search/showarticle.asp?arid=23466&way=search http://www.mindtree.com/kc/view_art.php?wid=8 http://www.mindtree.com/kc/view_art.php?wid=9 http://mouthshut.com/user/girish1729/1.html Sandboxing and Virtualization demystified II "Zen and the art of motorcycle maintenance." "Xen and the art of virtualization." Ha! Xen is a silent revolution that is taking place in the enterprise marketplace. In fact it is impacting software and hardware and the future of both in such a manner as no other. Hyperbole aside, let us figure out what makes Xen so "Zen"ny. Let us start from first principles. Virtualization was first postulated in a formal manner by Goldberg and Popek several decades ago. Their paper clearly states what is required for achieving true virtualization. Nothing fancy. Just certain sufficient conditions for achieving virtualization. And the most popular hardware architecture on earth, the x86 simply does not abide by those principles. Which is to say that it is impossible to virtualize x86 hardware. Now what? Should we change the whole world and ask people to buy better hardware that supports virtualization? A programmer's genius is measured by how much he accomplishes under the given constraints, how he solves the particular problem in hand within the framework of the constraining factors. And Xen's creators did the impossible. They virtualized a platform hardest to virtualize. That, too, they have done it in such a manner that even hardware assisted virtualization for Xen, chips like Intel VT-x and AMD-Pacifica don't give a big performance boost over Xen's pure software virtualization. The genius of Xen is apparent in the fact that several decades old problems and research areas like process migration, data center clustering, layer 7 load balancing etc. have got so elegantly and beautifully solved by Xen. It is a revolution, a disruptive technology that suddenly makes us step back and think, "what is going on?". I don't understand. Xen took a radically different approach to solving arguably the most pressing need of today; how to properly provision and utilize the various servers running in huge data centers of the world. Remember, that is where all the business is. All the money is there. So isn't it logical that the smartest men embark on that task? The fact that Xen is completely open and open source is the surest evidence for the market sense that open source makes today. Hopefully you will not have to try hard to convince your pointy haired boss why businesses can save money through open source. Now, coming back to the point. Perhaps the most attractive feature of Xen is live migration. I don't think media fully understands the power, flexibility and technical challenge involved in these two words "live" and "migration." Let us take a cue from the world of cellular telephony. I cannot think of any other analogy though it somewhat under represents the true challenge ad "coolness" of Xen's live migration. Cellular telephony would not have taken off but for the "hand over" procedure in in which a user freely roams between multiple coverage areas and the conversation is carried on without any interruption. What happens in the background is completely oblivious to the user. The fact that a different RF frequency is used and that a completely different base station is communicating with him is all unknown to him. And the user can switch multiple base stations very rapidly if he is traveling very fast. The call is kept alive without any interruption or inconvenience in such a manner that the user perceives absolutely nothing about hand over. Imagine the amount of research and experimentation and hard work that would have gone into making this possible. Without fundamental support for this in the design of cellular telephony, it is inconceivable to achieve this feat. I am not sure if live migration was one of Xen's goals. It must have been. I am not able to guess. Whatever it is, I know at least this much, that it could not have been added as an afterthought. Now for the real technical scoop. For that we have to get a context of the Xen problem and solution space. The fundamental difference in Xen's approach as opposed to what has been proposed until now to solve the same problem is this. "The goal of running multiple OSes simultaneously on x86 cannot be achieved without modifying the OS involved." No, this is not some postulate or a rule carved in stone. But, this is what is lying at the root of all of Xen's capabilities. If Xen had taken a different approach, say like that of VMWare which attempts to run unmodified OSes, then it would be very difficult indeed to do the sort of things that Xen so easily accomplishes. By porting OSes to Xen instead of the reverse, a whole new world of possibilities open up, the most important and relevant one being of course performance; that too so close to native performance that would stump everyone. VMWare on the other hand dynamically translates unsafe operations and resorts to JIT techniques somewhat similar to the ones used by emulators like Qemu. However, a large portion of the assembly instructions are left untouched and execute directly on bare hardware. This explains why VMWare is able to perform so well. And since VMWare approaches the solution space from a different direction to attack the same problem, there is a huge amount of effort in terms of lines of code and design challenges, especially since as I mentioned before x86 is a platform that lacks IOMMU, proper memory protection and other things that are required for proper virtualization. Simply put, VMWare attempts to do the impossible with another high ideal; that of running an unmodified OS on top of it. Needless to mention, it does extremely well. But Xen also attempts to do the impossible. Except that by changing the ideal to require modified OS to run on top, it has accomplished feats like live migration, near native performance, security, and protection, especially against denial of service attacks and so on. This explains why the entire business community has aligned with Xen to support its objective. I saw close to 15 top notch companies including Intel and AMD, of course, that have embraced Xen and actively support it. It is very clear that Xen verily takes computing to the next level. And the nanokernel architecture of yore has been reborn in the form of Xen. Basically, it is quite simple. Right now, the call stack is this. User space application -> System call -> kernel -> Hardware Right now with Xen, it is changed to User space application -> System call -> kernel -> nanokernel -> Hardware This nanokernel is what is called xen. It is also known by the name of "hypervisor". User space applications execute system calls and the kernel executes hypercalls instead of directly accessing hardware. And the operating systems (kernels) are run as virtual machines on top of the Xen nanokernel or hypervisor. I know that the terms nanokernel and microkernel do not make much sense today, but I am using them for want of better terminology for you to relate to. Each of the operating system instances are run as multiple "domains" in Xen terminology. All device drivers and management code run in domain 0. And the guest OSes run on 'domU's. What is critical and perhaps most important to understand is, that though the Xen architecture shows multiple layers and appears as though there is a great deal of overhead as there is plenty of translation going on, in reality there is very little. Since by porting the OS to Xen, you are taking care of the most critical and commonly used instructions to directly start calling Xen's virtualized driver interface instead of the hardware directly. And the low level interrupts, memory, disk and network devices are virtualized in an efficient and low overhead manner. The architecture also has an equally appealing security feature by which each of the executing domains are protected from each other and isolated. Even driver failure will not crash the machine like before. Now that we have a thorough context of the design of Xen, let us get back to the most interesting question. How is live migration done? One thing should be borne in mind. Live migration is supported only for remote storage devices like NAS. To migrate an entire disk image is not even attempted by Xen, as it would not make sense in server environments in data centers. And, today's network speeds don't facilitate that either. Perhaps in then future. The key concept to understand here is that of a WWS or Writable Working Set of memory pages. At any given point of time, a heavily loaded server like a video streaming server or a multi player gaming machine has only a small set of memory pages being constantly written into. All other processesi are more or less static. And, Xen resorts to a technique called "iterative pre copy" to synchronize dirty pages between two physical hardware machines. This is done so well that typical down times observed, though highly application dependent, can be as low as 60 milliseconds(gasp). Remember, there is absolutely no service interruption. All the TCP connections are preserved. How? Because the VM instances are usually bridged at layer 2 instead of layer 3. Hence, by redirecting ARP requests, this is done. This translates to immense convenience since ssh, http, and smtp connections are preserved across live migration. Quite obviously, this cannot work across remote machines or even machines that are not on the same physical subnet. This can be improved and solutions can be expected very soon. Another marvelous aspect of design is that amidst the busy synching of memory pages, the rest of the network traffic is not interrupted. So, a certain rate adaptation is done to take care of this. Let us now figure out how the introduction of hypervisor to the scheme of things enable operating system kernels, in particular device drivers, to talk to hardware. The Xen hypervisor completely abstracts out the machine hardware in spite of executing driver code natively most often without any modification. How is this apparent contradiction handled? First of all, what are the critical hardware pieces required by the kernel and device drivers? Of course, CPU, memory and interrupts. Programs cannot run without the timer interrupt, so at least these three are needed. Then, for accessing the various physical devices like disk, network, serial port, display and so on, the drivers need few critical hardware accesses. a) DMA b) I/O ports c) Device memory d) Interrupts How Xen handles all of these things is not completely clear to me. DMA access cannot be protected across domains thanks to lack of support in the x86 architecture. Interrupts are mapped to "event channels" by Xen. So essentially, interrupts are swallowed by Xen and pushed up through event channels. This, incidentally, is the communication mechanism between various virtual machine instances or domains. Shared memory is used for various other communication mechanisms. Protecting unauthorized memory access is a tricky issue for Xen on x86. Lack of proper hardware access control mechanisms mean more work for Xen. I think most of these problems should disappear with newer generation hardware architectures. The disks and network, however, have special status in Xen, as they are virtualized using xbd and xennet drivers. Since a lot of throughput is required in this case, special care is taken to ensure safety and also performance. The network layer allows a great degree of flexibility as full fledged firewalling, QoS, bridging and access control can be enforced by the management domain. Overall, my feeling is that Xen has got everything right for taking the resource utilization and efficiency of server grade hardware by properly distributing applications, performing real load balancing, fault isolation, consolidation, etc. I think it is a blessing for data center designers. You and I benefit by being able to run multiple OSes at the same time. Not too bad, is it, considering that it is open source? Resources :- 1) www.cl.cam.ac.uk/research/srg/netos/papers/2003-xensosp.pdf (Xen and the art of virtualization) 2) www.xensource.com (Xen Enterprise ) 3) http://www.cl.cam.ac.uk/research/srg/netos/xen/ (Xen home page) 4) http://www.netbsd.org/Ports/xen/ (NetBSD port to Xen)