[HN Gopher] Accidental complexity, essential complexity, and Kub... ___________________________________________________________________ Accidental complexity, essential complexity, and Kubernetes Author : paulgb Score : 83 points Date : 2022-09-05 19:47 UTC (3 hours ago) (HTM) web link (driftingin.space) (TXT) w3m dump (driftingin.space) | theteapot wrote: | > Through Brooks' accidental-vs-essential lens, a lot of | discussion around when to use Kubernetes boils down to the idea | that essential complexity can become accidental complexity when | you use the wrong tool for the job. The number of states my | microwave can enter is essential complexity if I want to heat | food, but accidental complexity if I just want a timer. With that | in mind ... | | I'm twisting my mind trying to grasp this interpretation in | Brooks' complexity paradigm. I'm sure Brooks would be interested | to learn there exists so called wrong-tools that can reduce | essential complexity to accidental :). I think Brooks would put | it that the essential complexity is the same and irreducible, but | the accidental complexity is increased when the wrong tool is | used. | [deleted] | threeseed wrote: | > The fact that you need a special distribution like minikube, | Kind, k3s, microk8s, or k0s to run on a single-node instance. | (The fact that I named five such distributions is a canary of its | own.) | | They serve completely different purposes though. | | Some are Docker-based designed to be lightweight for testing e.g | Kind, others are designed to be scalable to full clusters e.g | k3s. | | And I think having the choice is a good thing as it proves that | Kubernetes is vendor-agnostic. | bob1029 wrote: | Also see closely related paper: | | http://curtclifton.net/papers/MoseleyMarks06a.pdf | | This one inspired our most recent system architecture. | candiddevmike wrote: | Thank you for sharing this, much more insightful than the | article | orf wrote: | > One way to think of Kubernetes is as a distributed framework | for control loops. Broadly speaking, control loops create a | declarative layer on top of an imperative system. | | Finally: a post about Kubernetes that actually understands what | it fundamentally is. | | > Kubernetes abstracts away the decision of which computer a pod | runs on, but reality has a way of poking through. For example, if | you want multiple pods to access the same persistent storage | volume, whether or not they are running on the same node suddenly | becomes your concern again. | | I'm not sure I agree that this itself constitutes a leaky | abstraction. Kubernetes still abstracts away the decision of | which computer a pod runs on, the persistent storage volume is | just another declarative _constraint_ on where a pod can be | scheduled. It 's no different from specifying you need "40" CPU | cores, and thus only being able to be scheduled on nodes with >40 | cores. | | > There's no fundamental reason that I should need anything | except my compiler to produce a universally deployable unit of | code. | | I agree - but you can do that with 'containers'. You just create | an empty "FROM scratch" container then copy your binary in. The | end result is that your OCI image is just a single .tar.gz file | containing your binary alongside a JSON document specifying | things like the entrypoint and any labels. Just like a shipping | container, this plugs into anything that ships things shaped like | that. | | You get a _bunch_ of nice stuff for free with this (tagged + | content addressable remote storage with garbage collection! | inter-image layer caching!), even if you 're slinging about WASM | binaries I'd still package them as OCI images. | | > The use of YAML as the primary interface, which is notoriously | full of foot-guns. | | There's a lot more to say here, and it's more of a legitimate | criticism of Kubernetes than most of the "hurr dur k8s complex" | critisisms you commonly see. The ecosystem has kind of centered | around Helm as a way of templating resources, but it's... | horrible. It's all so horrible. A huge step up from ktml or other | rubbish from the past, but go's template language isn't fun. | | But I'm not sure how it could have gone any differently. JSON is | as standard and simple as it gets and is simple to auto-generate, | but it's not user friendly. So either the kubernetes getting | started guide starts with "first go auto-generate your JSON using | some tool we don't distribute", or you provide a more user- | friendly way for people to onboard. | | YAML is a good middle ground here between giving users the | ability to auto-generate resources from other systems (i.e | spitting out JSON and shovelling it into a k8s API) and something | user-friendly for people to interact with and view using | kubectl/k9s/whatever. | soco wrote: | I'm confused, I thought JSON was created because XML wasn't | user friendly. I on the other hand see no user friendliness in | either YAML, JSON or XML. Just formatted text with or without | tabs. Users don't like standards so whatever you choose | somebody will complain. | dinosaurdynasty wrote: | I often wonder how many footguns could've been removed if | projects like k8s/ansible used TOML instead of YAML. | bvrmn wrote: | TOML is awful substitution in the ansible context. Let's play | a game, how about I give you a real-life playbook and you | translate it to TOML? | [deleted] | mati365 wrote: | Kubernetes and its complexity again.. | bvrmn wrote: | > The number of states my microwave can enter is essential | complexity | | Two dials to set power and clock-timer is enough. Microwaves with | digital panel and led display are a perfect example of accidental | complexity. | hbrn wrote: | The more I think about it, the more I realize that generic | declarative style, despite sounding very promising, might not be | a good fit for modern deployment. Mainly due to technology | fragmentation. | | There's no one true way to deploy an abstract app, each tech | stack is fairly unique. Two apps can look exactly the same from | the desired state perspective, but have two extremely different | deployment processes. They might even have exactly the same tech | stack, but different reliability requirements. | | Somehow you need to be able to accommodate for those differences | in your declarative framework. So you'll pay the abstraction | costs (complexity). But you will only reap the benefits if you're | constantly switching the components of your architecture. And | that typically doesn't happen: by the time you get to a point | where you need a deployment framework, your architecture fairly | rigid. | | Maybe k8s makes a lot of sense if you're Google. But 99.99% of | companies are not Google. | jrockway wrote: | I agree very much about the accidental complexity of containers. | Ignoring the runtime concerns (cgroups, namespaces, networking, | etc.), the main problem seems to have been "we can't figure out | how to get Python code to production", and the solution was "just | ship a disk image of some developer's workstation to production". | To do this, they created a new file format ("OCI image format", | though not called that at the time), a new protocol for | exchanging files between computers ("OCI distribution spec"), a | new way to run executables on Linux ("OCI runtime spec"), and | various proprietary versions of Make, with a caching mechanism | that isn't aware of the actual dependencies that go into your | final product. The result is a staggering level of complexity, | all to work around the fact that nobody even bothered trying to | add a build system to Python. | | Like the author, I tend to write software in a language that | produces a single statically linked binary, so I don't need any | of this distribution stuff. I don't need layers, I don't need a | Make-alike, I don't need a build cache, but I still have to go | out of my way to wrap the generated binary and push it to a super | special server. Imagine a world where we just skipped all of | this, and your k8s manifest looks like: | containers: - name: foo-app image: - | architecture: amd64 os: linux binary: | https://github.com/whatever/download/foo-app/foo-app-1.0.0-linux- | amd64 checksum: sha256@1234567890abcdef - | architecture: arm64 os: linux binary: | https://github.com/whatever/download/foo-app/foo-app-1.0.0-linux- | arm64 checksum: sha256@abdcef1234567890 | | Meanwhile, if you don't want to orchestrate your deployment and | just want to run your app: wget | https://github.com/whatever/download/foo-app/foo-app-1.0.0-linux- | amd64 chmod a+x foo-app-1.0.0-linux-amd64 ./foo- | app-1.0.0-linux-amd64 | | I dunno. I feel like, as an industry, we spent billions of | dollars, founded brand new companies, created a new job title | ("Devops", don't get me started), all so that we could avoid | making Python output a single binary. I'm not sure that random | chaos did the right thing, and you're right to be bitter about | it. | tsimionescu wrote: | The article is pretty lite, though I agree with most of the | points. | | However, I think that, especially in the context context of | Kubernetes, this part is completely wrong: | | > Containers were a great advance, but much of their complexity | falls into the "accidental" category. There's no fundamental | reason that I should need anything except my compiler to produce | a universally deployable unit of code. | | Containers are not used in Kubernetes or other similar | orchestrators because of their support for bundling dependencies | - that is a bonus at best. | | Instead, they are used because they are a standard way of using | cgroups to tightly control what pieces of the system a process | has access to, so that multiple processes running on the same | system can't accidentally affect each other, and so that a | process can't accidentally depend on the system it is running on | (including things like open ports). These are key properties for | a system that seeks to efficiently distribute workloads on a | number of computers without having to understand the specifics of | what workload it's running. | | They are also used because container registries are a ready made | secure Linux distribution-agnostic way of retrieving software and | referring to it with a unique name. quay.io/python:3.7 will work | on Ubuntu, SuSE, RedHat or any other base system, unlike relying | on apt/yum/etc. | paulgb wrote: | > Containers are not used in Kubernetes or other similar | orchestrators because of their support for bundling | dependencies - that is a bonus at best. | | > Instead, they are used because they are a standard way of | using cgroups to tightly control what pieces of the system a | process has access to | | Well, sure, you get both. But the point is that needing a post- | build step just to get that _is_ accidental complexity. I 'm by | no means a Java fan, but the way JVM gives jars as a compile | target and lets you run them with some measure of isolation is | an example of how things could be. | threeseed wrote: | With Maven, Gradle etc you can output a Docker container from | a single package command. | | And I don't understand this idea of using code bundles as the | deployment artefact. | | They don't specify _how_ the code should be run e.g. Java | version, set of SSL certificates in your keystore etc. | | Do you really want Kubernetes to have hundreds of options for | each language ? Or is it better to just leave that up to the | user. | anonymous_sorry wrote: | I think the suggestion is to use native Linux binaries with | static linking of any libraries and resources in the | executable. | | Java programs would need to be compiled as a standard ELF. | I think GraalVM can do this. | threeseed wrote: | Only some Java/Scala programs can be compiled as a single | binary. | | It's been a concept that has been around for years but | not progressing to the point where it comes close to | negating the need for the JVM in a Production | environment. | | In the meantime containers works today and is | significantly more powerful and flexible. | jayd16 wrote: | I just don't follow the argument. You can also use Java | tooling to wrap up a container image. Why is it _accidental_ | complexity that we settled on a language agnostic target that | also wraps up a lot of the process isolation metadata? | paulgb wrote: | Java is an outlier here. For most languages the process of | dockerizing a codebase involves learning a separate tool | (Docker). As someone who knows Docker, I get the temptation | to say "who cares, just write a few lines of Dockerfile", | but after talking to a bunch of developers about our | container-based tool, having to leave their toolchain to | build an image is a bigger sticking point than you might | think. | jayd16 wrote: | But this is begging the question. If the work was more | integrated with the compiler, it would still need to be | learned. If your compiler of choice spit out an | deployable unit akin to an image you'd still need | something akin to the Dockerfile to specify exposed ports | and volumes and such, no? | paulgb wrote: | The Dockerfile doesn't specify volumes, those are | configured at runtime. In the case of Kubernetes, they're | configured in the pod spec. | | As for ports, EXPOSE in a Dockerfile is basically just | documentation/metadata. In practice ports are exposed at | runtime (if using docker) or by configuring services (if | using Kubernetes), and these are unaffected by whether a | port is exposed in the dockerfile. | | IMHO this is how it should be -- if I'm running a | containerized database or web server, I want to be able | to specify the port that the rest of the world sees it | as, I don't want the container creator to decide that. | candiddevmike wrote: | Containers are a temporary solution while we wait for everyone | to realize self contained binaries with static linking are the | real solution. | rocmcd wrote: | The executable piece is nice, but not the whole picture. | Configuration and, more importantly, isolation of the | runtime, are also huge benefits that come with containers. | tsimionescu wrote: | No, containers give you things that static binaries just | don't. How do you specify the maximum allowed memory for a | static binary? The ports it opens? The amount of CPU usage? | The locations it will access from the host file system? | | Also, how will you distribute this static binary? How do you | check that the result of a download is the specified version, | and the same that others are downloading? How will you | specify its name and version? | | By the time you have addressed all of these, you will have | re-implemented the vast majority of what standard container | definitions and registries do. | anonymous_sorry wrote: | The orchestrator can still use cgroups for resource | constraints and isolation. Or it could use virtualization - | it would be an implementation detail. But devs would not | have to build a container. | | Binary distribution, versioning and checksumming shouldn't | need to be coupled to a particular format. | | Obviously docker solves a bunch of disparate problems. | That's kind of the objection. | threeseed wrote: | That is only a real solution if everything is running on the | same operating system in the same environment. | | What is the point of Kubernetes at all in your situation. | candiddevmike wrote: | Everything already is running on the same operating system | with Kubernetes where it matters (kernel, runc/crun, etc). | Containers are a band aid to wrap an app with additional | files/libraries/whatever. In Go for instance, I can include | all this stuff at compile time (even conditionally!). | | I'd love to see Kubernetes be able to schedule executables | directly without using containers by way of systemd nspawn | or similar. You could have the "container feel" without the | complexity of the tool chain required to | build/deploy/run/validate containers. | theteapot wrote: | Sounds pretty much like containers are the solution to | containers. | [deleted] | zamalek wrote: | I think they mean different things to different people. From | the problems I have faced with customer-controlled OS | installations, the biggest thing that they offer is | configuration isolation (or rather independence). I have seen | some truly crazy shit done by customer's administrators and | even crazier shit done by management software that they | install. Taking away that autonomy is huge. | tsimionescu wrote: | Absolutely, but the context of the article was specifically | Kubernetes. | jfoutz wrote: | After thinking about this for a few minutes, the author might | be on the wrong track connecting this with containers. | | But I think there's something really there about "environmental | linting". I know deep in my bones I need write access to make | log files, but I don't know how many times I've debugged | systems lacking this permission. | | I know the log path won't be known until runtime, I know the | port can be specified at runtime, but I think there's a ton of | room for improvement around a tool that says - hey, you're | making this set of assumptions about your environment, and you | should have these health checks or tests or whatever. | | I agree with you about this is what containers give, but I | think the author is really on to something about the dev | tooling and environment warning about what sorts of permissions | are needed to, like, work. ___________________________________________________________________ (page generated 2022-09-05 23:00 UTC)