[HN Gopher] Running e2e tests 10x faster using firecracker VMs ___________________________________________________________________ Running e2e tests 10x faster using firecracker VMs Author : samanthachai Score : 114 points Date : 2022-04-17 17:00 UTC (6 hours ago) (HTM) web link (webapp.io) (TXT) w3m dump (webapp.io) | fideloper wrote: | What do y'all run firecracker on? The metal servers on aws (the | only servers you can run firecracker on in aws) are pretty | expensive! | neatze wrote: | What does this e2e tests in webapp ? | | I don't understand why you need to rebuild docker image every app | build, this seems like really wasteful. | n8ta wrote: | If the app itself is part of the image you need to rebuild the | image every time a dev wants to test their change. | mtoddsmith wrote: | Is that the same as redeploying your app to an existing | container? | goodpoint wrote: | This has been done successfully using VMs since 2 decades. | yewenjie wrote: | Interesting. What other cool things are people doing with | Firecracker? | cpach wrote: | Fly built a whole platform with Firecracker VMs: | https://fly.io/ | sjosh003 wrote: | I have been using Weave Ignite [1] recently to run Firecracker | micro vm(s) instead of containers for a multitude of tasks! | | 1. https://github.com/weaveworks/ignite | kaivalyagandhi wrote: | interesting, I wonder if you can use this with GitHub self hosted | runners? | StreamBright wrote: | Great article. Firecracker has been an amazing addition to my | toolkit and it is good to see succeeding in solving real world | problems. | rossmohax wrote: | They seem to be comparing CI runner starting from scratch to | always on VM with firecracker preconfigured. | nicoburns wrote: | Firecracker _is_ a CI runner starting a VM for each run in this | case, just a more optimised one, no? | greatgib wrote: | Always amaze me to see the new trend of DevOps that will be | happily following such a tutorial, wget and running random code | from the internet in production... | jrockway wrote: | I don't think this is production, this is for running your | tests. Your code in the "tests haven't run yet" state probably | leak all the secrets they have access to and destroy the | machine they're running on, so you don't let them have any | secrets and create a new machine each time. "curl | bash" here | just injects potential flakiness (as does "npm install" when | npm dies, etc.) | | Obviously a lot of people treat their CI system as their CD | system, and do things like letting tests have highly privileged | access to their production k8s cluster. That's a terrible idea | even if you aren't installing software with "curl | bash". | | So overall, I don't think this is worth a HN comment to | complain about. People are going to install software in non- | auditable non-reproducible ways. | CraigJPerry wrote: | There's an even faster strategy than this and it's easier to | setup. | | You're going to deploy 4 CI pipelines (so make sure you're not | manually putting together ci pipelines configs, use automation): | | Pipeline 1: A conveyor belt of environments. All this pipeline | does is spin up fresh environments then run a short automated | smoke test. Hydrate the env with the most recent mask from prod. | The trigger condition is there's less than <Threshold> | environments available. I did 8 on a whim and never saw a need to | change it. | | Pipeline 2: Normal garden variety CI pipeline triggered on merges | to main. Output of this will be two artifacts persisted: a built | package and your unit test evidence | | Pipeline 3: Test your automated deployment by deploying the | package build from #2 into the first of the queue of free envs | from #1 trigger your end to end and integration and contract | tests. Don't run your security or operability tests here. | | Pipeline 4: Async pipeline triggered on a 6hr schedule, do your | long running stuff like fuzz testing here, your security tests | etc. do these outside of the dev cycle. | | Release candidates can only be signed after a successful run | through 2, 3 & 4. That means prod deploys are on a predictable | cadence which users and ops are usually appreciative of rather | than we drop it in when it's ready. | | The DevEx is pretty sweet - you don't see pipeline 1 or 4 in your | build loop. Only the runtime of 3 would be comparable to the | article - slightly faster than the article because no firecracker | bringup overhead, no matter how small that is. | drjasonharrison wrote: | There are times when some corner of software development speaks | a specialized language and this is an example. | | 1. Conveyor belt(?) of environments. Hydrate(?) the | env(ironment). Mask(?) from prod(uction) | | 2. I think I got this. Typical "merge to main pipeline" with | built product and test results as outputs. | | DevEx(?). And not sure why I wouldn't see pipeline #4 in my | build loop because I can't deploy unless 2, 3 and 4 pass.... | Maybe you mean I don't wait to see it. | | Also not sure how it's faster because environments still need | to be brought up. Unless you are trying to say that the | environment is already running when the merge to master | pipeline succeeds. | forgotusername6 wrote: | Used to do something similar with vsphere a while back. The | servers took ages to get into the right state to test so much | easier to just revert to snapshot to get a clean state. | wyldfire wrote: | Gee, why not just go straight to step 3 via fork/exec? Bound to | shave off a few milliseconds beyond that 10x. And no firecracker | required. | melony wrote: | If you a cloud host, you need a way to sandbox hostile code. | Firecracker allows you to do that (it is a configuration of the | traditional KVM virtualization system except lighter and | faster, instead of booting a VPS which can take minutes, you | can now spawn one in under a second). | FooBarWidget wrote: | Not just sandboxing, but just ensuring that each test runs in | a clean environment, without interference from | files/processes left behind by a previous or even concurrent | test. | wyldfire wrote: | To clarify my post: I see the reason for Firecracker to exist | in general, it's great. But does "e2e tests" include | untrusted code? I think it really shouldn't. | | So why use firecracker here? Invoking your tests in a bare VM | or container is great for making sure that you are | controlling the environment and enumerating your system | dependencies. But this post proposes discarding those things | and instead using some saved state as the entry point into | your Firecracker. So now you are booting from Your Image | instead of a { Official Distro Image + Dependency Recipe }. | It seems like a step backward. | chrisseaton wrote: | > But does "e2e tests" include untrusted code? | | What other possible way could CI work? | melony wrote: | The company that wrote the article is a e2e testing _cloud | hosting_ company that runs your code in _their cloud_. | colinchartier wrote: | Author here! | | I think there's always been a push/pull of "fat base | images" versus "install everything every time" - It's | obviously subjective, but I think it's more important to | run the tests on every commit than it is to start the | environment from scratch. | | It's also not necessarily mutually exclusive, you could | have a "staging branch" where you make something that looks | a lot like production and then re-run end-to-end tests | there, while running the per-branch tests with this method | to avoid slowing down developers. | legulere wrote: | Because process isolation under unix is pretty lax. Processes | have by default have all the rights of the user. And you might | end up with a system different from the initial state | ithkuil wrote: | Firecracker is great and all, but the core idea here described | works also with plain docker; i.e. there is nothing inherently | firecracker specific to the basic technique | colinchartier wrote: | Author here! | | The three big differences are: | | 1. Docker doesn't deal with running processes (like postgres or | redis), only the filesystem state | | 2. Docker doesn't have enough isolation, so you'd probably need | to run it within qemu or firecracker for compliance in bigger | teams | | 3. Docker-in-docker is still pretty painful, if you need to do | anything nonstandard like change the size of /dev/shm, access | /dev/kvm, or load kernel drivers, it'll take custom | configuration. | ignoramous wrote: | Hi, offtopic but: is webapp.io a pivot from layerci, or just | a rebranding? | | Interesting that you're folks now use firecracker. I assume | it now fills in adequately for the previously homegrown tech | at layerci [0]? | | [0] https://news.ycombinator.com/item?id=25979941 | colinchartier wrote: | Just a rebranding! (The technology's gotten better as well, | of course - we didn't used to use firecracker at all) | | https://webapp.io/blog/layerci-has-rebranded-to-webapp-io/ | throwaway894345 wrote: | I'm confused. Why do you need to snapshot live processes? Are | we concerned about startup time of Postgres or whatever? | Also, why is isolation needed for e2e tests? Lastly, why is | docker-in-docker a requirement, and how is that easier than | qemu in qemu or qemu in docker or whatever? | colinchartier wrote: | > Why do you need to snapshot live processes? | | Often times there are long-living processes which rarely | change but take a long time to warm up. The Bazel [1] agent | for C++ projects, the buildkit [2] state for docker, or the | running Postgres or Redis server for a cloud native app for | example. | | It's why running "docker build" twice on your laptop is so | fast, but running "docker build" in CI seems glacially | slow. | | > why is docker-in-docker a requirement, and how is that | easier than qemu in qemu or qemu in docker or whatever? | | The example given was running "docker-compose build", so | you'd need either docker-in-firecracker (this post), | docker-in-docker, or docker-in-qemu. You'd almost never run | docker-compose build on bare metal in practice, because | you'd immediately need to send the images you built | somewhere else in order to use them. | | [1] https://bazel.build/ [2] | https://docs.docker.com/develop/develop- | images/build_enhance... | cpuguy83 wrote: | But that's state on disk, not process state. It should | not affect startup time in buildkit. | | I'm not experienced enough with Bazel to comment on that. | cpuguy83 wrote: | Docker does handle snapshots of running processes. It's | called checkpoint/restore, it utilizes the CRIU tooling to do | this. | | In terms of doing this in a CI env like actions where you may | have different types of machines serving you, it may be | problematic as the machine specs need to pretty closely | match. | jitl wrote: | Yeah, I don't like that the article itself treats building the | DB seed data, etc, into the Firecracker VM image like this is | impossible to do in Docker. The techniques are good things to | do -- but it's very tenuous how the techniques are connected to | Firecracker. | | I've do all of the above using multi-layered Docker files and a | cron CI job to rebuild the base integration test image every 6 | hours. Sure if you need the isolation, Firecracker is the way | to go. But if you invest primarily in container shenanigans to | speed up CI with Docker, it's not too much extra work to wrap | it in a Firecracker VM, plain QEMU, or whatever once you start | wanting more isolation. | | Also, maybe I'm holding it wrong but Docker in Docker had not | bitten us yet on our GitHub action runners. | lgierth wrote: | You don't need a management daemon running though, and get a | complete virtualized kernel that can be customized if needed. | bornfreddy wrote: | Ok, so IIUC, the main difference with firecracker versus | docker is that processes are better separated from each other | ("micro VM" instead of namespaces) and that one can run a | customized kernel. But for e2e tests I've written, neither of | these advantages mattered. | | I do love the idea of taking a snapshot of a prebuilt | database image and can see where this would really speed up | the tests. | tedunangst wrote: | But why does it require firecracker and not qemu? | colinchartier wrote: | QEMU takes much longer to save/restore snapshots, and it's much | harder to do via the API | [deleted] | n8ta wrote: | Sounds like having an actual non-ephemeral computer with extra | steps... ___________________________________________________________________ (page generated 2022-04-17 23:00 UTC)