[HN Gopher] Bottlerocket, an open source Linux distribution buil... ___________________________________________________________________ Bottlerocket, an open source Linux distribution built to run containers Author : sshroot Score : 151 points Date : 2020-09-01 18:24 UTC (4 hours ago) (HTM) web link (aws.amazon.com) (TXT) w3m dump (aws.amazon.com) | LeSaucy wrote: | Look out Rancher! | chromedev wrote: | Rancher has been around a long time and isn't cloud-specific. | Until this distro can prove itself useful outside of the AWS | ecosystem, then Rancher will still dominate. | throwaway3neu94 wrote: | Bottlerocket seems more like a competitor to RancherOS, which | is abandonware. (https://github.com/rancher/os/issues/3000) | | I liked RancherOS's very few moving parts approach (even your | shell is a Docker container). Hope Bottlerocket will be | somewhat similar. | solatic wrote: | As strong as the engineering behind Bottlerocket seems to be, I'm | not entirely sure who they built it for, except as a foundational | component for AWS's managed offerings. | | If you, as an AWS customer, decide to fully embrace AWS lock-in, | then why would you run this yourself on an EC2 instance instead | of running ECS or EKS? If you're trying to avoid AWS lock-in, why | would you choose an OS that's locking you into AWS Systems | Manager and Amazon Linux 2 for debugging needs? | acdha wrote: | There are varying levels of lock-in. A Linux distribution to | run containers is a lot easier to replace than, say, a database | with a proprietary query language custom semantics. | NathanKP wrote: | Hi I'm a developer advocate in the container engineering org at | AWS. I think there are a few misunderstandings here that I may | be able to explain better. | | First Bottlerocket is not Amazon Linux 2, it is its own minimal | operating system, with most components built from the ground up | in Rust. This is totally different than the Amazon Linux 2 you | may be familiar with (and most other operating systems for that | matter). Bottlerocket is optimized for running containers with | high security isolation. The host OS is extremely minimal, it | does not come with bash, an interpreter, ssh, or anything | beyond the system basics needed to run containers. In fact it | uses an immutable root filesystem. You aren't intended to run | or install things directly on the host at all. | | Everything that is installed and runs on Bottlerocket runs as | containers, which are kept isolated from each other and the | host with best practice security features such as SELinux. For | example you can connect to a container on the host via AWS | Systems Manager, or you can optionally enable a container that | lets you connect to it via SSH. Once again the thing you are | connecting to is the container on the host though, not directly | to the host. | | For this initial release of Bottlerocket we are focusing on | providing image variants that are prepackaged with the | requirements to serve as underlying container hosts for an ECS | or EKS cluster. However we also intend Bottlerocket to | eventually be something that anyone can use, anywhere, even | outside of AWS, if they wanted to benefit from the secure by | default, container first design of Bottlerocket. | | You can read more about the security features of Bottlerocket | here: https://github.com/bottlerocket- | os/bottlerocket/blob/develop... | | And you can find a bit more of the charter / goals for the | project here: https://github.com/bottlerocket- | os/bottlerocket/blob/develop... | e12e wrote: | Would it be safe to say that bottlerocket is a container host | for running containers under a hypervisor? Or is it intended | to do full hw interfacing and run bare metal? | NathanKP wrote: | Bottlerocket runs containers using containerd, so | containers are visible as processes on the host, from its | perspective, not currently isolated from each other via a | hypervisor. Bottlerocket limits containers ability to | interact with the host or each other via SELinux, among | other things. | | We do have firecracker-containerd | (https://github.com/firecracker-microvm/firecracker- | container...) which is designed to allow the containerd | runtime to launch containers as microVMs via Firecracker, | with that additional layer of isolation via the KVM | hypervisor. This stack is not currently fully compatible | with K8s or ECS though, so it is not implemented using that | approach yet. Rather Bottlerocket is built as a progressive | improvement on the current state of container hosts, which | is that many people are running all their containers on | their hosts without any strong security hardening at all. | | I think from the similar naming scheme of Firecracker and | Bottlerocket you can already see the pieces of the puzzle | that are in progress and the future potential though. | e12e wrote: | Thanks, I was more wondering what the relationship was | between something like Debian GNU/linux and bottlerocket. | From gp description it sounds like there's no "GNU | userland", just the Linux kernel and some utility | functions in rust - enough to launch containerd. | | So if I have a heterogeneous collection of servers - I | could install Debian, and run docker on Debian. It | _sounds_ like bottlerocket would more comfortably run on | top of a hypervisor abstracting away the actual hw a bit? | Eg on top of xen, kvm or VMware? | | Obviously the Linux kernel can be made to run on a | toaster, but maybe bottlerocket isn't ideal for that | purpose? | jhaynes wrote: | Answering your initial question and this one: | Bottlerocket today only runs in EC2, but we've tried to | make it flexible enough to run outside of a hypervisor on | bare metal in the future (in fact, a few engineers on the | team are really excited to get it running on their | RaspberryPi's at home; toasters haven't been added to our | roadmap yet ;) ). | | Bottlerocket has a GNU userland like many other distros. | It is just one that is stripped down of many things | including removal of interpreters, shells, and package | managers. | | If you want to explore more deeply, you can enable the | admin container and jump into a shell on the host[1] to | look at the filesystem and see what Bottlerocket's | userspace looks like up close and personal. You can also | see a bit more of this debugging/exploration tooling | explained in an AWS Partner Blog[2]. | | [1] https://github.com/bottlerocket- | os/bottlerocket#admin-contai... [2] | https://aws.amazon.com/blogs/apn/getting-started-with- | bottle... | mchusma wrote: | I tried to see any performance characteristics of running | bottlerocket, just to understand what the expectations are | there. I assume since it was not mentioned it is either | similar to Amazon Linux 2 or worse (but with security | advantages). Can you request a follow on post on the aws blog | that describes the performance impact of Bottlerocket? If | performance is better, would be nice to know that as well of | course. | mnd999 wrote: | So you saw CoreOS and thought "Let's re:invent that wheel"? | wmf wrote: | So did every other vendor. We have two CoreOSes, Flatcar, | Container-Optimized OS from Google, Bottlerocket, k3OS, | etc. Fortunately these aren't just different in name; | there's a lot of experimentation going on around different | ways to do updates, security, etc. I hope we'll eventually | see some convergence after a few years. | 013a wrote: | You mean, the CoreOS that was end-of-lifed four months ago? | k__ wrote: | At least "Fedora CoreOS" still seems to be a thing. | freedomben wrote: | And Red Hat Core OS (RHCOS) for Red Hat flavored shops. | rosywoozlechan wrote: | Red Hat bought the company that created CoreOS so it's | not surprising they have Red Hat/Fedora supported | versions right? | blixtra wrote: | CoreOS lives on as Flatcar Container Linux. https://twitt | er.com/kelseyhightower/status/12831024012520980... It's a | fully-compatible, drop-in replacement. | srameshc wrote: | Isn't it somewhat similar in concept to the Container- | Optimized OS that is being offered by GCP ? | WrtCdEvrydy wrote: | Question: Will this be an AMI that can be deployed as an ECS | cluster recipient on EC2? End Question. | NathanKP wrote: | Yes! It already is in fact. The ECS documentation provides | the list of Bottlerocket AMI's you can launch into your ECS | clusters: https://docs.aws.amazon.com/AmazonECS/latest/deve | loperguide/... | | I'd also recommend reading the ECS quickstart in the | Bottlerocket repo: https://github.com/bottlerocket- | os/bottlerocket/blob/develop... | WrtCdEvrydy wrote: | Well... i'll be modifying some launch configurations | then. | azinman2 wrote: | I just installed Proxmox on a home server, and I'm using its | CT containers (LXC) to run various services. Could I use this | as a replacement for Proxmox? | geoffeg wrote: | I read that this is more intended as the VM you run inside | Proxmox, that you then run your (docker/k8) containers | inside. | cle wrote: | I wish the blog post was just this. So much clearer. Thank | you. | simonebrunozzi wrote: | Amazon's PR would have not allowed that. | captn3m0 wrote: | >sure who they built it for | | Anytime AWS finds many of its customers using something outside | of AWS, they want to build it in-house. AWS customers were | using CoreOS a lot, hence this. | coder543 wrote: | If you're using an ECS or EKS Cluster, you still have to run | _some operating system_ on the ECS Container Instances. The | containers have to run somewhere. | | There are a variety of options currently, including one from | Amazon, but Bottlerocket seems designed to be a next-generation | OS for container instances. It's extremely minimal, and | designed to be as secure as possible, with transactional | automatic updates. | | Ideally, I would just deploy Bottlerocket as the underlying OS | on my ECS Cluster, and then never have to actually _do_ any | management of the Instance OSes whatsoever. I would deploy | containers on top of my ECS Cluster, Bottlerocket would keep | itself up to date and secure, and my containers would live | happily ever after. It would hopefully feel more similar to | using Fargate than not, except without paying the higher price | for Fargate, and having access to the wider variety of hardware | configurations available to regular ECS. | | Amazon has specifically said this in the Bottlerocket repo: | | > Bottlerocket is architected such that different cloud | environments and container orchestrators can be supported in | the future. | | It's mainly useful in concert with ECS or EKS right now, but it | is architected to be useful in other places as well. | | I'm excited about Bottlerocket as a project, and I'm glad it's | open source instead of just an opaque AMI that you can use on | ECS or EKS. | solatic wrote: | > It's mainly useful in concert with ECS or EKS right now, | but it is architected to be useful in other places as well. | | Right, this is the big question. If a community forms to | support Bottlerocket off of AWS, then that's one thing, but | until then, it basically just seems like a better option for | ECS or EKS. | coder543 wrote: | >>> I'm not entirely sure who they built it for, except as | a foundational component for AWS's managed offerings. | | >> <snip> | | > If a community forms to support Bottlerocket off of AWS, | then that's one thing, but until then, it basically just | seems like a better option for ECS or EKS. | | I'm failing to see why that's a problem or a source of | confusion. There's a clearly defined market of "who they | built it for", and there's a community option to expand | Bottlerocket's target market. | | The code is open source, so anyone could fork Bottlerocket | and modify it to suit their needs immediately. Will | Bottlerocket see success beyond ECS and EKS? Only time will | tell... literally no one knows yet. | zokier wrote: | > If you're using an ECS or EKS Cluster, you still have to | run some operating system on the ECS Container Instances. The | containers have to run somewhere. | | Forgetting Fargate there? | mtndew4brkfst wrote: | Like most of the rest of AWS product people and AWS users | do, IME. | coder543 wrote: | I literally mentioned Fargate in the same comment: | | >> It would hopefully feel more similar to using Fargate | than not, except without paying the higher price for | Fargate, and having access to the wider variety of hardware | configurations available to regular ECS. | | If you're using an ECS Fargate Cluster or EKS Fargate | Cluster, I don't consider those the same as an ECS Cluster | or EKS Cluster. Unfortunately, there's no specific term | commonly used for non-Fargate Clusters that I know of. AWS | offers ECS Clusters and ECS Fargate Clusters. | | If you were legitimately confused by my comment, I'm sorry. | I could have said "non-Fargate" repeatedly, if it would | have helped, but I thought the context made things clear, | especially with the additional explicit mention of Fargate | as a separate thing. | [deleted] | spicyusername wrote: | So basically Amazon CoreOS. | senthilnayagam wrote: | I remember reading last week linux plumber conference agreed to | allow rust in linux kernel . | | but these guys have built an OS in rust. | chromedev wrote: | Did you even read the blog post? This is running Linux, and | only the AWS software-components running on it are largely | written in Rust. It is still Linux and the OS is not written in | Rust, nor the container daemon and probably none of the non-AWS | software packages. | haunter wrote: | Wasn't Amazon Linux 2 something similar? Or I'm mixing it up | https://aws.amazon.com/amazon-linux-2/ | tootie wrote: | Seems like a similar goal, but the difference is that | Bottlerocket is targeted at containers and Amazon Linux is | targeted at EC2. | acdha wrote: | Amazon Linux 2 is an AWS-optimized Linux distribution but it's | a full distribution you're used to -- RPM-based, generally | compatible with RHEL/Centos, etc. -- and that has the usual | benefits and costs. You can customize it as much as any other | Linux distribution but you're taking on the corresponding level | of effort to secure and operate it. If all you run are | containers, that's overhead for things you're not really using | and it also means that you're going to be slower to update | things like the kernel and system libraries since there's more | to test and backwards compatibility is a big concern. | moondev wrote: | Cluster API does both! The operator rolls out immutable machine | images that make up clusters. It's badass. | daxfohl wrote: | So difference between this and Firecracker would be that the | latter is boot-speed and overhead optimized, and this one is a | bit heavier but more capable? | | If choosing between this and say Kata Containers plus | Firecracker, the latter would be more secure because of VM | isolation but this would be more efficient because multiple pods | could go in a single VM? | | Is Bottlerocket secure enough to host multi-tenant workloads | within the same VM? | chromedev wrote: | Firecracker and Kata containers are ways to run containers | inside lightweight VMs that boot directly into the Kernel, they | are not Linux distros in themselves. You are trying to compare | apples to oranges. You could run Firecracker or Kata containers | on top of something like Bottlerocket, however Bottlerocket is | geared more towards the container-sandbox and isolation crowd | while Kata/Firecracker is for those who think you can only get | isolation using VMs. | DyslexicAtheist wrote: | while _" Amazon drivers are hanging smartphones in trees to get | more work"[0]_ and while _" Amazon is Hiring an Intelligence | Analyst to Track 'Labor Organizing Threats'"_ and while _" Amazon | deletes job listings for analysts to track 'labor organizing | threats'"[2]_ ... in this thread we are celebrating these lizards | for their Tech innovation. What's more scary than the damage to | society the employees hiding in these companies do, is the | cognitive dissonance we experience here of highly skilled | individuals (who should know better) telling themselves | "technology is neutral". | | [0] https://news.ycombinator.com/item?id=24342540 | | [1] https://news.ycombinator.com/item?id=24343361 | | [2] https://news.ycombinator.com/item?id=24345259 | dpryden wrote: | I'm confused about how the documentation recommends using a | Kubernetes operator to manage OS updates. That seems weird and | backwards to me. I would rather see an immutable OS AMI in an | auto-scaled group, and just replace the node instance whenever | there is an update. | | I can see a place for managing OS updates on an instance, but | that seems more like "pets" than "cattle"... and I've always | treated Kubernetes nodes like cattle, not pets. Isn't that the | most common approach anyway? | jjtheblunt wrote: | your description made me think of... | | https://www.merriam-webster.com/dictionary/fungible | dingo_aussie wrote: | Why? I view being able to do this as a huge advantage. Don't | want to lose instance store state just for an OS update and | love the kubernetes operator interface to be able to do this. | The kubernetes operator also operates at a cluster level which | means we don't need to write scripts to churn ASGs. It is | eyebrow raising that they have only enabled this for Kubernetes | and not ECS. I suspect that this is one of many signs that that | the inferior and lock-in prone ECS service will be deprecated | soon. | paxys wrote: | Constantly rotating nodes in and out of the cluster and | restarting/relocating pods, even if mostly automated, causes a | lot of needless infrastructure strain. It is IMO one of the | most overlooked parts of Kubernetes, and I wish there was a | better solution to maintain stable, long-running processes when | needed. | dharmab wrote: | If your pods need to be long running, you can annotate them | as such and they will not be autoscaled. | | https://github.com/kubernetes/autoscaler/blob/master/cluster. | .. | rumanator wrote: | > I would rather see an immutable OS AMI in an auto-scaled | group, and just replace the node instance whenever there is an | update. | | It sounds like you're trying hard to reinvent Kubernetes while | doing your best to avoid mentioning Kubernetes features like | Kubernetes operators. | yahooligan2230 wrote: | We use 5000+ CoreOS nodes in production and never want to go | back to replacing VMs with new images for each update again. | In-place immutable updates are more efficient and faster. | Unlike RPM based OSes that are hard to patch, transactional | updates provide a safe way to perform safe in-place updates | instead of wasteful operations such as replacing full VMs for | small OS updates. | abhiyerra wrote: | I agree with you. This seems more complex than just having a | auto scale group that auto rotates nodes after a certain amount | of time and just picking a new update when the node launches. | NathanKP wrote: | I can provide a little background on this. In general yes I | would recommend that you just use an ASG and roll out a new | AMI. However that approach can be very expensive and time- | consuming at truly massive scale (1000's or even 10's of | thousands of machines). | | Bottlerocket is built in part based on our experiences | operating AWS Fargate, which obviously has as one of its | needs the ability to patch a colossal number of hosts which | are running people's containers, without downtime or | disrupting their containers. Bottlerocket is designed to | ensure that this is both efficient and safe. We aren't the | only ones with this need. Many large orgs also have | tremendous fleets, and its unacceptable to cause significant | disruption by rotating at the host level. | | Another aspect to consider is stateful workloads that are | using the local disks. Bottlerocket lets you safely update | your host if you are running something like a database or | other stateful system where you don't really want to move | your data around. | | Not everyone will need to use this updating mechanism, but I | think it will be very attractive to many of the larger | organizations with a lot of infrastructure. | GauntletWizard wrote: | It seems to me, not to be combative, that if Fargate can't | afford the "noschedule: node is old" overhead and customers | of Fargate can't handle their containers restarting on a | regular basis, there's something wrong with your management | engine or with their design and implementation. Much of the | point of containerization is that you can roll containers | often and run enough of them that you never have a single | point of failure. What part of that assumption is broken | that destroying machines regularly doesn't work? | NathanKP wrote: | There are any number of reasons to avoid restarting | things. Some customers are running code that has a cold | start and needs some time to warm up its cache if it | restarts. Some customers are running jobs (video | rendering, machine learning training, etc) that might | take literally days to complete. Interrupting these jobs | and causing them to restart wastes the customer time and | causes them to lose progress. Other containers may be | hosting multiplayer game servers, and forcing them to | restart would cause all people logged into the game | instance to get disconnected or otherwise dropped from | their game. | | All of the above are use-cases that AWS Fargate is used | for. Beyond this many folks simply don't like it when | things happen unexpectedly outside of their control. We | have Fargate Spot for workloads that can tolerate | interruption, and we discount the price if you choose | this launch strategy. However Fargate on-demand seeks to | avoid interrupting your containers. You are in control of | when your containers start and stop or autoscale. | IMTDb wrote: | This stuff is probably waaaay over my head, but isn't | that why SIGTERM was made for ? To notify a running | process that the host needs to be shutdown/restarted and | to let the running process finish it's current task | (current frame encoding / current multiplayer game / | current request / ...) and that the state / cache / | progress / ... needs to be saved. | | The process on aws side would then be : send SIGTERM to | all workloads. wait for [configurable] amount of time | (maxed at xx hours) _or_ until all workloads have exited | (whichever comes first). Shutdown the node. Update the | node. Start the node. Restart the workloads. | NathanKP wrote: | Yep you are right about SIGTERM, but let's think back to | the original reason why we wanted to update the node: | because of a patch, probably a security patch for a CVE? | | What is the better option here? Implement a SIGTERM based | process that allows the user to block the patch for a | critical, possibly zero-day CVE for xx hours, remaining | in a vulnerable state the entire time? Or implement a | system that just patches the underlying host without | interrupting the workloads on the box? | | You aren't wrong, what you described is a possibility, | but it is not the best possibility. | tedivm wrote: | This makes a ton of sense and I appreciate the response. | I think what people aren't recognizing is that cloud | services make you pay for performance, so doing things | like relaunching containers which have slow warmup time | literally costs extra money. While it's certainly | important to design systems such that the containers can | be tossed aside easily, that doesn't mean there isn't | value in reducing how often that tossing aside occurs. | nuclearnice1 wrote: | Forgive me my hijack | | Any plans to reduce the minimum bill time for Fargate to | accommodate short tasks? | | With 1 minute minimum billing you have to turn to lambda | for very short tasks or have a long running Fargate | consuming tasks from some message bus. | | If you choose lambda, your containers don't work so you | need to rebuild your runtime with lambda layers or ebs or | squeeze into the lambda env. | | If you choose messaging, say SQS from a lambda called by | API gateway you've complicated your architecture and your | Fargate instance is potentially hanging out billing, | idle, and waiting for messages. | | Fargate spot removed the last reason to consider AWS | Batch. Short tasks could largely replace lambda. | | It would be nice to Fargate all the things. | john-shaffer wrote: | Nothing is "broken" about it. It's just that when you | have tens of thousands of machines that might need an | urgent security update, it's very inefficient and costly | to destroy all of them at once instead of patching. | Destroying machines regularly is not the same thing as | frequently destroying all of them at once. | freedomben wrote: | This is how OpenShift 4 does things. I too thought it was | strange at first but now with some experience it's quite | pleasant. | | Can be a beast to debug though if you haven't done it before. | robszumski wrote: | Aside from being faster than replacing all of the hosts, the | reason OpenShift does it this way is that you can't just burn | down and replace a fleet of bare metal machines. While re- | PXEing is possible, this takes a ton of time and stresses | that infrastructure. | | Doing the same on cloud, metal, OpenStack, VMware, etc means | that your cluster's operational experience remains the same | and in most cases is less disruptive. | | edit: having your nodes controlled by your cluster has a | number of other benefits aside from patching, like the Node | Tuning Operator that can tweak settings based on the types of | workloads running on that set of machines. | dan_quixote wrote: | I can assure you that OpenShift doesn't take this path | because it is "better". It does so because bare-metal is a | significant part of their market and there isn't a better | option to automate the process currently. | | I once worked on a competing product (before the OS update | operator was available) and the update-in-place model was | always a disaster. Various problems like dns, service | discovery, timeouts, breaking changes to dependency pkgs, etc | make for a problematic process. Combine that with the frantic | pace of k8s develeopment, short node compatibility window | (2-3 minor k8s releases) and various CVEs - you end up | debugging a lot of machines in unknown states that fail to | rejoin clusters after reboots. | spicyusername wrote: | This has definitely not been my experience running many | hundreds of Red Hat CoreOS nodes in production. | | So far, aside from a few small flakey issues, having the | cluster nodes _and_ the OpenShift cluster update in lock | step has been dramatically simpler to manage. | adolph wrote: | Firecracker, Bottlerocket, starting to see a trend here | | https://aws.amazon.com/blogs/aws/firecracker-lightweight-vir... | statictype wrote: | Both are written in Rust. Looks like Amazon is investing in the | language quite a bit. | trishankdatadog wrote: | Little-known fact: like Google Fuchsia, Bottlerocket uses The | Update Framework (TUF)[1][2] to securely update itself! | | [1] https://theupdateframework.io/ | | [2] https://github.com/awslabs/tough | aex wrote: | Free project idea: A Qubes OS alternative built on Bottlerocket. | chromedev wrote: | Why Bottlerocket? Why not just use Alpine, Void, NixOS, Arch, | etc instead? | geek_at wrote: | I've fallen in love with Alpine. I'm using it on my bare bone | servers running from a ram disk (USB drive) hosting Docker | containers and even qemu VMs. All on encrypted zfs volumes. | Totally love it ___________________________________________________________________ (page generated 2020-09-01 23:00 UTC)