[HN Gopher] My VM is lighter (and safer) than your container (2017) ___________________________________________________________________ My VM is lighter (and safer) than your container (2017) Author : gaocegege Score : 234 points Date : 2022-09-08 12:26 UTC (10 hours ago) (HTM) web link (dl.acm.org) (TXT) w3m dump (dl.acm.org) | lamontcg wrote: | Containers should really be viewed as an extension of packages | (like RPM) with a bit of extra sauce with the layered filesystem, | a chroot/jail and cgroups for some isolation between different | software running on the same server. | | Back in 2003 or so we tried doing this with microservices that | didn't need an entire server with multiple different software | teams running apps on the same physical image to try to avoid | giving entire servers to teams that would be only using a few | percent of the metal. This failed pretty quickly as software bugs | would blow up the whole image and different software teams got | really grounchy at each other. With containerization the chroot | means that the software carries along all its own deps and the | underlying server/metal image can be managed seperately, and the | cgroups means that software groups are less likely to stomp on | each other due to bugs. | | This isn't a cloud model of course, it was all on-prem. I don't | know how kubernetes works in the cloud where you can conceivably | be running containers on metal sharing with other customers. I | would tend to assume that under the covers those cloud vendors | are using Containers on VMs on Metal to provide better security | guarantees than just containers can offer. | | Containers really shouldn't be viewed as competing with VMs in a | strict XOR sense. | nikokrock wrote: | I don't remember where i read it but as far as i know when | using Fargate to run containers (with k8s or ecs) AWS will just | allocate an ec2 instance for you. Your container will never run | on the same vm as other customer. This explain i think the | latency you can have to start a container. To improve that you | need to handle your own ec2 cluster with an autoscaling group | djhaskin987 wrote: | Not surprising that VMs running unikernels are as nimble as | containers, but not quite useful either, at least in general. | Much easier to just use a stock docker image. | ricardobeat wrote: | How does LightVM compare to Firecracker VMs? Could it be used for | on-demand cloud VMs? | [deleted] | r3mc0 wrote: | Containers and VMs are totally not the same thing. They serve a | complete other purpose , as multiple containers can be combined | to create an application/service , VMs always use a complete os | etc etc anyway the internet is full of the true purpose of | containers , they were never meant to use as a "VM" and about | security.. meh everything is insecure until proven differently | wongarsu wrote: | VMs can have private networks between each other just as | containers do. That's pretty much what EC2 is about. | nijave wrote: | VMs don't need a full OS. You can run a single process directly | from the kernel with no init system or other userland | fnord123 wrote: | Title is kinda clickbaity (wha-? how can a VM be lighter than a | container). It's about unikernels. | JeanSebTr wrote: | Exactly, unikernels are great for performance and isolation, | but that can't be compared to a full application stack running | in a container or VM. | throwaway894345 wrote: | > how can a VM be lighter than a container | | It's still clickbaity, but the title implies a comparison | between a very lightweight VM and a heavy-weight container | (presumably a container based on a full Linux distro). You | could imagine an analogous article about a tiny house titled | "my house is smaller than your apartment". | marcosdumay wrote: | It is still lighter in memory only. CPU is also a relevant | thing to compare them. | turkishmonky wrote: | Not to mention, in the paper, the lightvm only had an | advantage on boot times. Menory usage was marginally worse | than docker, even with the unikernel, and debian on lightvm | was drastically worse for cpu usage than docker (the | unikernel cpu usage was neck and neck with the debian docker | contaner). | | I could see it being an improvement over other VM control | planes, but docker still wins in performance for any | equivalant comparisons. | nailer wrote: | Firecracker VMs are considered lighter than a container and are | pretty old at this point. | sidkshatriya wrote: | I would say that firecracker VMs are _not_ more lightweight | than Linux containers. | | Linux containers are essentially the separation of Linux | processes via various namespaces e.g. mount, cgroup, process, | network etc. Because this separation is done by Linux | internally there are not that many overheads. | | VMs provide a different kind of separation one that is | arguably more secure because it is backed up hardware -- each | VM thinks it has the whole hardware to itself. When you | switch between the VM and the host there is quite a | heavyweight context switch (VMEXIT/VMENTER in Intel | parlance). It can take a long time compared to just the usual | context switch from one Linux container (process) to another | host (process) or another Linux container (process). | | But coming back to your point, no firecracker VMs are not | lighter/lightweight than a Linux container. They are quite | heavyweight actually. But the firecracker VMM is probably the | most nimble of all VMMs. | [deleted] | kasperni wrote: | [2017] | GekkePrutser wrote: | Sometimes the less strict separation is a feature, not a bug. | | Without folder sharing with dockers for example, it would be | pretty useless. | 1MachineElf wrote: | While a flawed comparison, WSL does use a VM in conjunction | with the 9p protocol to achieve folder sharing. | liftm wrote: | 9p-based folder sharing is (used to be?) possible with qemu, | too. | ksbrooksjr wrote: | It looks like it still is supported [1]. I noticed while | reading the Lima documentation that they're planning on | switching from SSHFS to 9P [2]. | | [1] https://wiki.qemu.org/Documentation/9psetup | | [2] https://github.com/lima- | vm/lima/blob/3401b97e602083cfc55b34e... | gavinray wrote: | The issue with unikernels and things like Firecracker are that | you can't run them on already-virtualized platforms | | I researched Firecracker when I was looking for an alternative to | Docker for deploying FaaS functions on an OpenFaaS-like clone I | was building | | It would have worked great if the target deployment was bare | metal but if you're asking a user to deploy on IE an EC2 or | Fargate or whatnot, you can't use these things so all points are | moot | | This is relevant if you're self-hosting or you ARE a service | provider I guess. | | (Yes, I know about Firecracker-in-Docker, but I mean real | production use) | eyberg wrote: | This is a very common misunderstanding in how these actually | get deployed in real life. | | Disclosure: I work with the OPS/Nanos toolchain so work with | people that deploy unikernels in production. | | When we deploy them to AWS/GCP/Azure/etc. we are _not_ managing | the networking /storage/etc. like a k8s would do - we push all | that responsibility back onto the cloud layer itself. So when | you spin up a Nanos instance it spins up as its own EC2 | instance with only your application - no linux, no k8s, | nothing. The networking used is the networking provided by the | vpc. You can configure it all you want but you aren't managing | it. Now if you have your own infrastructure - knock yourselves | out but for those already in the public clouds this is the | preferred route. We essentially treat the vm as the application | and the cloud as the operating system. | | This allows you to have a lot better performance/security and | it removes a ton of devops/sysadmin work. | gamegoblin wrote: | This is a limitation of whatever virtualized instance you're | running on, not Firecracker itself. Firecracker depends on KVM, | and AWS EC2 virtualized instances don't enable KVM. But not all | virtualized instance services disable KVM. | | Obviously, Firecracker being developed by AWS and AWS disabling | KVM is not ideal :) | | Google Cloud, for instance, allows nested virtualization, IIRC. | verdverm wrote: | Ive used GCP nested virtualization. You pay for that overhead | in performance so I wouldn't recommend it without more | investigation. We were trying to simulate using luks and the | physical is key insert / removal. Would have used it more if | we could get GPU passthrough working | shepherdjerred wrote: | Azure and Digital Ocean allowed nested virt as well! | gavinray wrote: | Yeah but imagine trying to convince people to use an OSS tool | where the catch is that you have to deploy it on special | instances, only on providers that support nested | virtualization | | Not a great DX, haha I wound up using GraalVM's "Polyglot" | abilities alongside it's WASM stuff | Sohcahtoa82 wrote: | > We achieve lightweight VMs by using unikernels | | When I attended Infiltrate a few years ago, there was a talk | about unikernels. The speaker showed off how incredibly insecure | many of them were, not even offering support for basic modern | security features like DEP and ALSR. | | Have they changed? Or did the speaker likely just cherry-pick | some especially bad ones? | eyberg wrote: | You are probably talking about this: | https://research.nccgroup.com/wp-content/uploads/2020/07/ncc... | | In short - not a fundamental limitation - just that kernels | (even if they are small) have a ton of work that goes into | them. Nanos for instance has page protections, ASLR, virtio-rng | (if on GCP), etc. | sieabah wrote: | The headline reads like a reddit post so I'm going to assume | the same still holds true. | wyager wrote: | It's not clear to me that VMs actually do offer better isolation | than well-designed containers (i.e. not docker). | | It's basically a question of: do you trust the safety of kernel- | mode drivers (for e.g. PV network devices or emulated hardware) | for VMs, or do you trust the safety of userland APIs + the | limited set of kernel APIs available to containers. | | On my FreeBSD server, I kind of trust jails with strict device | rules (i.e. there are only like 5 things in /dev/) over a VM with | virtualized graphics, networking, etc. | nijave wrote: | I think it gets even more complicated with something like | firecracker where they recommend you run firecracker in a jail | (and provide a utility to set that up) | [deleted] | dirkg wrote: | Why is a 5yr old article being posted now? If this were to catch | on, it would've. I just dont see it being used anywhere. | | Having a full Linux kernel available is a major benefit that you | lose, right? | faeriechangling wrote: | What I see happening now on the cloud is containers from | different companies and different security domain running on the | same VM. I have to think this is fundamentally insecure and that | VMs are underrated. | | I hear people advocate QubesOS for security which is based on XEN | when it comes to running my client. They say my banking should be | done in a different VM than my email for instance. Well if that's | the case, why do we run many containers doing different security | sensitive functions on the same VM when containers are not really | considered a very good security boundary? | | From a security design perspective I imagine hardware being | exclusive to a person/organization, vms being exclusive to some | security function, and containers existing on top of that makes | more sense from a security function but we seem to be playing | things more loosely on the server side. | bgm1975 wrote: | Doesn't AWS use firecracker with its Fargate container service | (and lambda too)? | jupp0r wrote: | (2017) | jjtheblunt wrote: | "orders of magnitude" : | | Why does anyone ever write "two orders of magnitude" when 100x is | shorter? | | Of course, this presumes 10 as the magnitude and the N orders to | be the exponent, but I don't think I've ever, since the 90s, seen | that stilted phrasing ever used for a base other than 10. | IshKebab wrote: | Because two orders of magnitude does not mean 100x. It means on | the same order as 100x. | jjtheblunt wrote: | Do you mean folks using the phrase know big-O, big-omega, | big-theta, and are thinking along those lines? | IshKebab wrote: | It's nothing to do with big-O; it's about logarithms. But | really I think most people using it just think of it like: | "which of these is it closest to? 10x, 100x or 1000x?" | xahrepap wrote: | This reminds me: in 2015 I went to Dockercon and one booth that | was fun was VMWare's. Basically they had implemented the Docker | APIs on top of VMWare so that they could build and deploy VMs | using Dockerfiles, etc. | | I've casually searched for it in the past and it seems to not | exist anymore. For me, one of the best parts of Docker is | building a docker-image (and sharing how it was done via git). It | would be cool to be able to take the same Dockerfiles and pivot | them to VMs easily. | All4All wrote: | Isn't that essentially what Vagrant and Vagrantfiles do? | hinkley wrote: | What is your theory for why Docker won and Vagrant didn't? | | Mine is that all of the previous options were too Turing | Complete, while the Dockerfile format more closely follows | the Principle of Least Power. | | Power users always complain about how their awesome tool gets | ignored while 'lesser' tools become popular. And then they | put so much energy into apologizing for problems with the | tool or deflecting by denigrating the people who complain. | Maybe the problem isn't with 'everyone'. Maybe Power Users | have control issues, and pandering to them is not a | successful strategy. | duskwuff wrote: | What turned me off from Vagrant was that Vagrant machines | were never fully reproducible. | | Docker took the approach of specifying images in terms of | how to create them from scratch. Vagrant, on the other | hand, took the approach of specifying certain details about | a machine, then trying to apply changes to an existing | machine to get it into the desired state. Since the | Vagrantfile didn't (and couldn't) specify everything about | that state, you'd inevitably end up with some drift as you | applied changes to a machine over time -- a development | team using Vagrant could often end up in situations where | code behaved differently on two developers' machines | because their respective Vagrant machines had gotten into | different states. | | It helped that Docker images can be used in production. | Vagrant was only ever pitched as a solution for | development; you'd be crazy to try to use it in production. | mmcnl wrote: | Docker is not fully reproducible either. Try building a | Docker image from two different machines and then pushing | it to a registry. It will always overwrite. | xahrepap wrote: | Yes, which is what I'm using now. But it doesn't use the | Docker APIs to allow you to (mostly) reuse a dockerfile to | build a VM or a container. | | not sure if it would be better than Vagrant. But it was still | very interesting. | verdverm wrote: | They might have built it into Google Anthos as part of their | partnership. I recall seeing a demo where you could deploy & | run any* VMWare image on Kubernetes without any changes | mmcnl wrote: | You are talking about declarative configuration of VMs. Vagrant | offers that, right? | P5fRxh5kUvp2th wrote: | eeeeeh....... | | yes, but then again ... no. | | I mean ... yes Vagrant does offer that, but no would I ever | consider Vagrant configuration anything approaching a | replacement for docker configuration. | JStanton617 wrote: | This paper references consistently mischaracterizes AWS Lambda as | a "Container as a Service" technology, when in fact it is exactly | the sort of lightweight VM that they are describing - | https://aws.amazon.com/blogs/aws/firecracker-lightweight-vir... | [deleted] | Jtsummers wrote: | In fairness to this paper, it was written and published before | that Firecracker article (2017 vs 2018). From another paper on | Firecracker providing a bit of history: | | > When we first built AWS Lambda, we chose to use Linux | containers to isolate functions, and virtualization to isolate | between customer accounts. In other words, multiple functions | for the same customer would run inside a single VM, but | workloads for different customers always run in different VMs. | We were unsatisfied with this approach for several reasons, | including the necessity of trading off between security and | compatibility that containers represent, and the difficulties | of efficiently packing workloads onto fixed-size VMs. | | And a bit about the timeline: | | > Firecracker has been used in production in Lambda since 2018, | where it powers millions of workloads and trillions of requests | per month. | | https://www.usenix.org/system/files/nsdi20-paper-agache.pdf | runnerup wrote: | Thank you for this detail! | xani_ wrote: | AWS "just" runs linux but this is using unikernels tho ? | Jtsummers wrote: | No, it's using a modified version of the Xen hypervisor and | the numbers they show are boot times and memory usage for | both unikernels and pared down Linux systems (via tinyx). | It's described in the abstract: | | > We achieve lightweight VMs by using unikernels for | specialized applications and with Tinyx, a tool that enables | creating tailor-made, trimmed-down Linux virtual machines. | wodenokoto wrote: | For what it's worth, Google's cloud functions are a container | service. You can even download the final docker container. | raggi wrote: | KVM gVisor is a hybrid model in this context. It shares | properties with both containers and lightweight VMs. | oxfordmale wrote: | Kubernetes says no... | | The article is light on detail. Containers and VMs have different | use cases. If you self host lightweight VMs is likely the better | path, however, once you in the cloud most managed services only | provide support for containers. | nailer wrote: | > in the cloud most managed services only provide support for | containers. | | Respectfully, comments like these are the reason for Kubernetes | becoming a meme. | oxfordmale wrote: | There is a huge difference running on VMs that you have zero | access to, and actually owning your own VM infrastructure. | Yes AWS Lambda runs on Firecracker, however, it could as well | running on a FireCheese VM platform and you would be none the | wiser, unless AWS publishes this somewhere. | | I am also not running on Kubernetes, because Kubernetes. AWS | ECS and AWS Batch also only handle containerised | applications. Even when deploying on EC2 I tend to use | containers, as it ensures they keep working consistently if | you apply patches to your EC2 environment. | lrvick wrote: | You can also use a firecracker runner in k8s to wrap each | container in a VM for high isolation and security. | bongobingo1 wrote: | I'm quite interested in seeing where slim VM's go. Personally I | don't use Kubernetes, it just doesn't fit my client work which is | nearly all single-server and it makes more sense to just run | podman systemd units or docker-compose setups. | | So from that perspective, when I've peeked at firecracker, kata | containers, etc, the "small dev dx" isn't quite there yet, or | maybe never will get there since the players target other spaces | (aws, fly.io, etc). Stuff like a way to share volumes isn't | supported, etc. Personally I find Dockers architecture a bit | distasteful and Podmans tooling isn't _quite_ there yet (but very | close). | | Honestly I don't really care about containers vs VMs except the | VM alleges better security which is nice, and I guess I like | poking at things but they're were a little too rough for weekend | poking. | | Is anyone doing "small scale" lightweight vm deployments - maybe | just in your homelab or toy projects? Have you found the | experience better than containers? | NorwegianDude wrote: | I've been using containers since 2007 for isolating workloads. | I don't really like Docker for production either because of the | network overhead with the "docker-way" of doing things. | | LXD is definetly my favorite container tool. | pojzon wrote: | How differently LXD manages isolation in comparison to docker | ? | | I suppose both create netns, bridge, ifs ? | lstodd wrote: | It's the same stuff - namespaces, etc. But it doesn't shove | greasy fingers into network config like docker. More a | tooling question/approach than tech. | antod wrote: | LXC/LXD use the same kernel isolation/security features | Docker does - namespaces, cgroups, capabilities etc. | | After all, it is the kernel functionality lets you run | something as a container. Docker and LXC/LXD are different | management / FS packaging layers on top of that. | staticassertion wrote: | I assume it's not using seccomp, which Docker uses, | although seccomp is not Docker specific and you can go | grab their policy. | xani_ wrote: | They went to trash because containers are more convenient to | use and saving few MBs of disk/memory is not what most users | care. | | The whole idea was pretty much either use custom kernel (which | inevitably have way less info on how to debug anything in it), | and re-do all of the network and storage plumbing containers | already do via the OS they are running one. | | OR just very slim linux one which at least people know how to | use but STILL is more complexity than "just a blob with some | namespaces in it" and STILL requires a bunch of config and data | juggling between hypervisor and VM just to share some host | files to the guest. | | Either way to get to the level of "just a slim layer of code | between hypervisor and your code" you need to do a quite a lot | of deep plumbing and when anything goes wrong debugging is | harder. All to get some perceived security and no better | performance than just... running the binary in a container. | | It did percolate into "slim containers" idea where the | container is just statically compiled binary + few configs and | while it does have same problems with debuggability, you _can_ | just attach sidecart to it | | I guess next big hype will be "VM bUt YoU RuN WebAsSeMbLy In | CuStOm KeRnEl" | evol262 wrote: | Virtualization is not just "perceived" security over | containerization. From CPU rings on down, it offers | dramatically more isolation for security than | containerization does. | | This isn't about 'what most users care' about either. Most | users don't really care about 99% of what container | orchestration platforms offer. The providers do absolutely | care that malicious users cannot punch out to get a shell on | an Azure AKS controller or go digging around inside /proc to | figure out what other tenants are doing unless the provider | is on top of their configuration and regularly updates to | match CVEs. | | "most users" will end up using one of the frameworks written | by a "big boy" for their stuff, and they'll end up using | what's convenient for cloud providers. | | The goal of microvms is ultimately to remove everything | you're talking about from the equation. Kata and other | microvm frameworks aim to be basically jsut another CRI which | removes the "deep plumbing" you're talking about. The onus is | on them to make this work, but there's an enormous financial | payoff, and you'll end up with this whether you think it's | worthwhile or not. | convolvatron wrote: | in a related vein, most of the distinctions that are being | brought up around containers vs vms (pricing, debugability, | tooling, overhead) are nothing fundamental at all. they are | both executable formats that cut at different layers, and | there is really no reason why features of one can't be | easily brought to the other. | | operating above these abstractions can save us time, but | please stop confusing the artifacts of implementation with | some kind of fundamental truth. its really hindering our | progress. | evol262 wrote: | Bringing the features of one to the other is exactly what | microvms means. | pojzon wrote: | With eBPF there is really not much to argue about in | security space. | | You can do everything. | | New toolset for containers covers pretty much every | possible use-case you could even imagine. | | The trend will continue in favor of containers and k8s. | tptacek wrote: | It is pretty obviously not the case that eBPF means | shared-kernel containers are comparably as secure as VMs; | there have been recent Linux kernel LPEs that no system | call scrubbing BPF code would have caught, without | specifically knowing about the bug first. | evol262 wrote: | Let me know when eBPF can probe into ring-1 hypercalls | into a different kernel other than generically watching | timing from vm_enter and vm_exit. | | Yes, there is a difference between "eBPF can probe what | is happening in L0 of the host kernel" and "you can probe | what is happening in other kernels in privileged ring-1 | calls". | | No, this is not what you think it is. | staticassertion wrote: | I'm not sure what you mean with regards to eBPF but the | difference between a container and a VM is massive with | regards to security. Incidentally, my company just | published a writeup about Firecracker: | https://news.ycombinator.com/item?id=32767784 | depingus wrote: | > So from that perspective, when I've peeked at firecracker, | kata containers, etc, the "small dev dx" isn't quite there yet, | or maybe never will get there since the players target other | spaces (aws, fly.io, etc). Stuff like a way to share volumes | isn't supported, etc. Personally I find Dockers architecture a | bit distasteful and Podmans tooling isn't quite there yet (but | very close). | | This is pretty much me and my homelab. I haven't visited it in | a while, but Weave Ignite might be of interest here. | https://github.com/weaveworks/ignite | opentokix wrote: | Tell me you don't understand containers wihout telling me you | dont understand containers. | anthk wrote: | You don't understand vm's neither. Ever used virtual network | interfaces? | devmor wrote: | Yes, when you custom engineer a specific, complex solution for a | specific use case, it is generally more performative than a | general-use solution that's simple. ___________________________________________________________________ (page generated 2022-09-08 23:00 UTC)