[HN Gopher] Common mistakes using Kubernetes ___________________________________________________________________ Common mistakes using Kubernetes Author : marekaf Score : 275 points Date : 2020-05-17 11:48 UTC (11 hours ago) (HTM) web link (blog.pipetail.io) (TXT) w3m dump (blog.pipetail.io) | fergonco wrote: | Shameless plug but on topic. I wrote recently about readiness and | liveness probes with Kubenetes. If you look for an educational | perspective you can check: https://medium.com/aiincube- | engineering/kubernetes-liveness-... | hinkley wrote: | > You can't expect kubernetes scheduler to enforce anti-affinites | for your pods. You have to define them explicitly. | | Why isn't this the default behavior? Why don't I have to go in | and tell it that it's okay to have multiple instances on the same | node? Why? So that I somehow feel like I've contributed to the | whole process by fixing something that never should break in the | first place? | | I know of a few pieces of code where I definitely want to run N | copies on one machine, but for all of the rest? Why am I even | running 2 copies if they're just going to compete for resources? | nielsole wrote: | Pod anti affinities did historically dramatically increase | scheduling times. Not sure this is the primary reason, but | probably one | jeffbee wrote: | It's quite possible that you have a machine with 192 CPU cores | in it, but it's very unlikely that you are able to write a | service that scales to that level ... and if you write it in Go | it's really unlikely that you can scale even to 8 CPUs. There's | nothing weird about having multiple replicas of the same job on | the same node. If you look through the Borg traces that Google | recently published you can find lots of jobs with multiple | replicas per node. | kirstenbirgit wrote: | Lots of good advice in this article. | zomglings wrote: | Really great article. | | I have used Kubernetes pretty heavily in the past, and didn't | know about PodDisruptionBudget. | gridlockd wrote: | Common mistake: using Kubernetes | mlthoughts2018 wrote: | I think there is more to the story for some of these points and | it can be dangerous to just take this at face value of best | practices. | | For example on the liveness / readiness probe item, the article | says, | | > " The other one is to tell if during a pod's life the pod | becomes too hot handling too much traffic (or an expensive | computation) so that we don't send her more work to do and let | her cool down, then the readiness probe succeeds and we start | sending in more traffic again." | | But this is often a very bad idea and masks long term errors in | underprovisioning a service. | | If the contention of readiness / liveness checks vs real traffic | is ever resulting in congestion, you need the failure of the | checks to surface it so you can increase resources. If you set | things up so this failure won't surface, like allowing the | readiness check to take that pod out of service until the | congestion subsides, you're only hurting yourself by masking the | issue. It basically means your readiness check is like a latency | exception handler outside the application, very bad idea. | | The other item that is way more complicated than it seems is the | issue about IAM roles / service accounts instead of single shared | credentials. | | In cases where your company has an enterprise security team that | creates extremely low-friction tools to generate service account | credentials and inject them, then sure, I would agree it's a best | practice to ruthlessly split the credentialing of every | application to a shared resource, so you can isolate access and | revoking. | | But if you are on some application team and your company doesn't | have a mature enough security tooling setup managed by a separate | security team, this can become a bad idea. | | It can lead to superlinear growth in secrets management as there | will be manual service account creation and credential | propagation overhead for every separate application. Non-security | engineers will store things in a password manager, copy/paste | into some CI/CD tool, embed credentials as ENV permanently in a | container, etc., all because they can't create and maintain the | end to end service account credential tools in addition to their | job as an application team engineer. It's something they think | about twice per year and need off their plate immediately to move | on to other work. | | Across teams it means you end up with 20 different team-specific | ways to cope with rapid growth of service accounts, leading to an | even worse security surface area, risk of credential-based | outages, omission of important testing because ensuring ability | to impersonate the right service account at the right place is | too hard, etc. | | Very often it is a _real_ trade-off to consider that one single | service account credential that has just one way to be injected | for every service is _safer_ in the bigger picture. | | Yes it means a credential issue for any service becomes an issue | for all, and this is a risk and you want automated tooling to | mitigate it, but it very often will be less of a risk than | insisting on a parochial best practice of individual service | account credentials, resulting in much worse and less auditable | secrets workflows overall _unless_ it is completely owned and | operated by a central security team in such a way that it doesn't | create any approval delays or workflow friction for application | teams. | jeffbee wrote: | You of course should monitor the rate of liveness flapping for | your services. The need to monitor it does not imply that it's | a bad feature. | zegl wrote: | Great post! If you're in the Kubernetes space for long enough, | you'll see all of these configuration mistakes happening over and | over again. | | I've created a static code analyzer for Kubernetes objects, | called kube-score, that can identify and prevent many of these | issues. It checks for resource limits, probes, podAntiAffinities | and much more. | | 1: https://github.com/zegl/kube-score | otterley wrote: | I actually disagree with the first recommendation as written - | specifically, not to set a CPU resource request to a small | amount. It's not always as harmful as it might sound to the | novice. | | It's important to understand that CPU resource requests are used | for scheduling and not for limiting. As the author suggests, this | can be an issue when there is CPU contention, but on the other | hand, it might not be. That's because memory limits are even more | important than CPU requests when scheduling: most applications | use far more memory as a proportion of overall host resources | than CPU. | | Let's take an example. Suppose we have a 64GB worker node with 8 | CPUs in it. Now suppose we have a number of pods to schedule on | it, each with a memory limit of 2GB and a CPU request of 1 | millicore (0.001CPU). On this node, we will be able to | accommodate 32 such pods. | | Now suppose one of the pods gets busy. This pod can have all the | idle CPU it wants! That's because it's a request and not a limit. | | Now suppose _all_ of the pods become fully CPU contended. The way | the Linux scheduler works is that it will use the CPU request as | a _relative weight_ with respect to the other processes in the | parent cgroup. It doesn 't matter that they're small as an | absolute value; what matters is their relative proportion. So if | they're all 1 millicore, they will all get equal time. In this | example, we have 32 pods and 8 CPUs, so under full contention, | each will get 0.25 CPU shares. | | So when I talk to customers about resource planning, I actually | usually recommend that they start with low CPU reservation, and | optimize for memory consumption _until_ their workloads dictate | otherwise. It does happen that particularly greedy pods are out | there, but that 's not the typical case - and for those that are, | they will often allocate all of a worker's CPUs in which case you | might as well dedicate nodes to them and forget about how to | micromanage the situation. | jeffbee wrote: | If you ask for 0.001 CPU share, you might get it. I would | advise caution. You that pod gets scheduled on a node with | another node that asks for 4 CPUs and 100MB of memory, it's not | going to get any time. | otterley wrote: | It depends. If the second pod requests 4 CPUs, it doesn't | necessarily mean that the first pod can't use all the CPUs in | the uncontended case. | | A lot of this depends on policy and cooperation, which is | true for any multitenant system. If the policy is that nobody | requests CPU, then the behavior will be like an ordinary | shared Linux server under load - the scheduler will manage it | as fairly as possible. OTOH, if there are pods that are | greedy and pods that are parsimonious in terms of their | requests, the greedy pods will get the lion's share of the | resources if it needs them. | | The flip side of overallocating CPU requests is cost. This | value is subtracted from the available resources, making the | node unavailable to do other useful work. Most of the time I | see customers making the opposite mistake - overallocating | CPU requests so much that their overall CPU utilization is | well under 25% during peak periods. | jeffbee wrote: | Most people would be thrilled to get anything close to 25% | CPU util. I guess one of the big missing pieces fro Borg | that hasn't landed in k8s is node resource estimation. If | you have a functional estimator, setting requests and | limits becomes a bit less critical. | xrd wrote: | I wish there was a way to upvote something 10x once a month here. | This would be the post I use that on. | | When I was writing my book my editor asked me to remove any | writing about mistakes and changes I made in the project for each | chapter. I had a bug that appeared and I wanted to write about | how I determined that and fixed it. They said the reader wants to | see an expert talking, as if experts never make mistakes or need | to shift from one tact to another. | | But, I find I learn the most from explanations that share how | your mental model was wrong initially and how you figured it out | and how you did it "more right" the next time. | | That's really how people build things. | mgkimsal wrote: | personally... I'd like to see that sort of info, but typically | not in the middle of a chapter/section. make a note or call out | in the text pointing to a section on why/how you got to the | 'correct' position. That info is often helpful info, but can | disrupt the flow of the 'good' information. | barrkel wrote: | s/tact/tack/ | | Tact means skill in dealing with other people, particularly in | sensitive situations. | | Tack has to do with sailing boats into the wind, and crucially | "changing tack" means changing direction. | xrd wrote: | Thanks, I'm glad I had you as an editor this time! I never | noticed that before. | fnord123 wrote: | >They said the reader wants to see an expert talking, as if | experts never make mistakes or need to shift from one tact to | another. | | Your editor was very fucking wrong. | gridlockd wrote: | > Your editor was very fucking wrong. | | The editor is completely right in what they were saying. You | just want them to be wrong, because you'd prefer to live in | the fantasy world where they are wrong. | | Let's say you go to get a surgery. You _don 't_ want the | doctor to tell you about all the times they fucked up and | what the awful consequences were. It doesn't matter that | they're probably a better surgeon now, having learned from | their mistakes. Psychologically, you need that person with | the sharp tool poking around inside your body to be a | _superhuman_. | | To a lesser degree, the same is true for any expert. Of | course _everybody_ makes mistakes. Notice the de- | personalization in the word "everybody". You can talk about | the mistakes _everybody_ makes, or those ones that _many | people_ make. If you talk about _your own mistakes_ however, | you lose the superhuman status. There may be a few situations | where that somehow helps you, but not when you _want to sell | books_. | fnord123 wrote: | I wonder if we're on the same page. Should a book on a | programming language discuss compilation errors and how to | interpret them? Should a book like Effective C++ (afaik the | most popular series of books on C++) exist? | sokoloff wrote: | I sure do want that surgeon to have been | presented/instructed on the common ways that surgeons | before them have made mistakes and how to avoid or overcome | them. | | That they won't tell me (the patient) is quite a different | question from whether they got the material from someone | more experienced. | justinclift wrote: | > Let's say you go to get a surgery. | | Shouldn't that be more like "Let's say you're learning to | be a surgeon."? | | For that situation, the person they're learning from | discussing problems they hit and how they solved them does | sound like it would be very useful. | natefox wrote: | So, so wrong. How did (s)he think experts get so good? Isn't | the phrase `the master has failed more times than the | apprentice has even tried` well known for a reason? | emerongi wrote: | It obviously depends on the type of the book and the | reader's expectations. It just might not have been the type | of book where you write about things like that. | ohyeshedid wrote: | >It just might not have been the type of book where you | write about things like that. | | I think that'd be more of the author's choice, instead of | the editor. | a1369209993 wrote: | > the master has failed more times than the apprentice has | even tried | | I've never actually heard that one before, but it's so | very, very true. | xrd wrote: | Obviously, I agree with you! | | I also think this is the way the majority of tech books are | written. | | Can you think of another where the author goes from mistake | to mistake and then finally gets it right? | | I believe there is a space in tech writing for this kind of | writing, but it is not something traditional book publishers | believe. | | This was an O'Reilly book by the way, with really good | editors and a really good editoral process. | | That editor was right most of the time, IMHO. | lolkube wrote: | #1: Using Kubernetes | dewey wrote: | You spent the time signing up just for this? | toomuchtodo wrote: | It's an important point for those who may unknowingly over | engineer. | dewey wrote: | And they are probably not going to change their mind by | reading an anonymous sarcastic comment on HN. | [deleted] | Nzen wrote: | For someone with a short time scale, only trawling this | thread, of course not. | | For someone young in this space, this comment and one | hundred others -for and against- sift into hiso'er | consciousness as part of the perceived zeitgeist of | kubernetes within the larger community. Perhaps after | several years, this person may have an intuition to avoid | kubernetes in favor of separate docker or lxd containers. | | That association of kubernetes as a FAANG-level tool | builds stronger with the linked article: this | hypothetical person can compare the struggles against | perceived resources and so on. But not everyone has time | to read the article, any given kubernetes article, that | we come across. Some of those times, it's enough to take | the temperature and move on. So, -over years- that may | build an aversion that would not have otherwise formed | had commenters avoided denouncing (or endorsing) | kubernetes with less than full commitment toward | convincing others. | | Also, in this toneless medium, I can't intuit much of the | emotional weight lolkube conveyed the sentence with. Was | this person rueful, playful ? | andrewflnr wrote: | But there are several other actually useful comments in | this thread warning about using kubernetes when it's not | needed, that cite actual reasons. We didn't need one by | "lolkube", not at any timescale. By the way, that name is | the clue you need about the tone. | [deleted] | politelemon wrote: | You might be joking but I'd say using k8s when you don't need | to, is definitely a mistake. | config_yml wrote: | What would a good liveness and readyness probe do for a rails | app? What kind of work and metrics would these 2 endpoints do in | my app? | Nextgrid wrote: | For the readiness probe a simple endpoint that returns 200 is | enough. This tests your service's ability to respond to | requests without depending on any other dependencies (sessions | which might use Redis or a user auth service which might use a | database). | | For liveness probe I guess you could check if your service is | accepting TCP connections? I don't think there should ever be a | reason for your service to outright refuse connections unless | the main service process has crashed (in which case it's best | to let Kubernetes restart the container instead of having a | recovery mechanism inside the container itself like supervisord | or daemon tools). | kevindong wrote: | > For the readiness probe a simple endpoint that returns 200 | is enough. This tests your service's ability to respond to | requests without depending on any other dependencies | (sessions which might use Redis or a user auth service which | might use a database). | | If the underlying dependencies aren't working, can a pod | actually be considered ready and able to serve traffic? For | example, if database calls are essential to a pod being | functional and the pod can't communicate with the database, | should the pod actually be eligible for traffic? | Nextgrid wrote: | The article explicitly warns against that: | | > Do not fail either of the probes if any of your shared | dependencies is down, it would cause cascading failure of | all the pods. | | The idea would be that the downstream dependencies have | their own probes and if they fail they will get restarted | in isolation without touching the services that depend on | them (that are only temporarily degraded because of the | dependency failure and will recover as soon as the | dependency is fixed). | UK-Al05 wrote: | If it has network problems, kubernetes can take out that | instance out of serving traffic. | | When your doing rolling upgrades it can signal your app is | ready to take traffic. | | Those are the main uses. | pmahoney wrote: | This is a good question, and I think the article doesn't cover | this topic well. | | From the article: > The other one is to tell if during a pod's | life the pod becomes too hot handling too much traffic (or an | expensive computation) so that we don't send her more work to | do and let her cool down, then the readiness probe succeeds and | we start sending in more traffic again. | | Well... maybe. Is it a routine occurrence that an individual | Pod becomes "too hot"? If your load balancer can retry a | request on, say, a 503 Service Unavailable, you may be better | off relying on that retry combined with CPU-based autoscaling | to add another Pod (it's simpler, tradeoff is the load balancer | may spend too much time retrying). | | If you can't or don't want to add additional Pods, then your | client is going to see that 503 (or similar) anyway. I'd say, | then, that the point of a Pod claiming it's "not ready" to get | itself removed from the load balanced pool is to allow the load | balancer to more quickly find an available Pod, but this adds | complexity and may be irrelevant if you run enough Pods to have | some overhead capacity. | | A Rails app is a bit different from a node/go/java app in that | (typically at least, if you're using Unicorn or other forking | servers) each individual Pod can only handle a limited number | of concurrent requests (8, 16, whatever it is). It's more | likely then that any given Pod is at capacity. | | But, liveness/readiness are not so simple. If these probes go | through the main application stack, then they're tying up one | of the precious few worker processes, even if only momentarily. | I haven't worked with Ruby in a number of years, but I remember | running a webrick server in the unicorn master process, | separate from the main app stack, to respond to these checks. | But I did not implement a readiness check that tracked the | number of requests and reported "not ready" if all the workers | are busy. | dang wrote: | I'm glad readers are liking the article, but please read and | follow the site guidelines. Note this one: _If the title begins | with a number or number + gratuitous adjective, we 'd appreciate | it if you'd crop it. E.g. translate "10 Ways To Do X" to "How To | Do X," and "14 Amazing Ys" to "Ys." Exception: when the number is | meaningful, e.g. "The 5 Platonic Solids."_ | | The submitted title was "10 most common mistakes using | kubernetes", HN's software correctly chopped off the "10", but | then you added it back. Submitters are welcome to edit titles | when the software gets things wrong, but not to reverse things it | got right. | chx wrote: | It misses the biggest one: using it. I ranted about the cloud a | decade ago http://drupal4hu.com/node/305 and there's nothing new | under the Sun. Still most companies doing cloud and Kubernetes | doesn't need it... practice YAGNI ferociously. | speedgoose wrote: | In my opinion, the most common mistake is not in the article : | using kubernetes when you don't need to. | | Kubernetes has a lot of pros or the papers but in practice it's | not worth it for most small and medium companies. | balfirevic wrote: | Do you also include managed kubernetes offerings, such as from | Digital Ocean, in that assessment? | blinkingled wrote: | Not the OP but yes if cost is a factor. As far as I know no | managed K8S offerings are cheap. | watermelon0 wrote: | - DigitalOcean offers free clusters. | | - Azure (still?) offers free clusters. | | - GCP offers one free single-zone cluster. | | If you need more than DO can offer in terms of compute | instances, you can probably afford GKE/EKS, which is around | 75$/month. | | --- | | Running highly-available control plane (K8s masters & etcd) | by yourself is NOT cheaper than using EKS. | | To achive high availability, EKS runs 3 masters and 3 etcd | instances, in different availability zones. Provisioning 3 | t3.medium instances (4 GB of memory and 2 CPUs) would cost | the same as a completely managed EKS. | | Not to mention the manual work you need to setup, maintain | and upgrade such instances. | [deleted] | jcrawfordor wrote: | Honestly my experience has been that managed k8s is often | _more_ complicated from a developer perspective than just k8s | - sure, you don 't have to deal with setting it up, but you | have to figure out how all the 'management' and provider- | specific features work, and they often seem pretty clumsily | integrated. | geerlingguy wrote: | And in many cases features that would help your use case | aren't enabled on that platform, or are in a release that's | still a year or two from being supported on that | platform... I'm looking at you, EKS. | speedgoose wrote: | Oh yes. Managed kubernetes is full of various issues. Some | major cloud providers sell very poor managed kubernetes. | | If someone knows about a reliable managed kubernetes, please | let me know. | politelemon wrote: | Can you elaborate on the cloud providers selling poorly | managed k8s - are they all problematic? I have no | experience with cloud provided k8s. | nyi wrote: | Are there good resources for making that decision according to | good criteria? | therein wrote: | A simple rule of thumb is the number of services you have. | For instance, my current employer has been working on getting | everything on Kubernetes for the last year or so and we have | two services... the frontend server-side renderer, and API. | rwoll wrote: | This was my thought exactly. The article is great assuming you | need to use k8s, but does leave out the important question: | does your project or product require k8s and all the overhead | it unavoidably entails? | | Amazon's Elastic Container Service (ECS) on Fargate deployment | type is probably a better option much of the time. Until you | maintain your own k8s cluster (including the hosted variants on | AWS, GCP, etc.) you might not realize how complex configuring | k8s is. | | While the AWS's ECS service may be more limited, I've found it | leaves less room to do the wrong thing. Unfortunately, the | documentation on ECS and the community support is inferior to | k8s, but I'll accept that if I don't have to spend a whole day | researching what service mesh to use with k8s and how to | configure a load balancer and SSL certs. | bavell wrote: | Great article, I've learned many of these firsthand and agree | with their conclusions. I have some more reading to do on PDBs! | | K8s is a powerful and complex tool that should only be used when | needed. IMO you should be wary of using it if you're trying to | host less than a dozen applications - unless it's for | learning/testing purposes. | | It's a complex beast with many footguns for the uninitiated. For | those with the right problems, motivations and skillsets, k8s is | the holy grail of scale and automation. | toshk wrote: | What would you suggest as an alternative, simpler form for | docker deploy, running and managing? Docker-compose? | ghaff wrote: | What do you mean by docker hosting? Kubernetes (and other | related tools) are container orchestration/management tools. | As if often the case in the management space, if you're just | running at small scale, you may not need anything beyond | container command line tools and some scripts. You could also | use Ansible to automate. | toshk wrote: | Thanks we are running a few node servers, we now deploy | command line. Dev we use docker-compose. But we are looking | for a way to easily share our servers. We developed it for | Amsterdam open source. Around 20-30 cities are in line to | start using it. Doesn't have to be scalable, or have high | availibility. Ease of deployment, easy way to update and | basic security. All sysadmins are pushing for kubernetes, | although for the big cities it makes sense, it really | starting to feel like an overkill for small cities who will | run 1-3 non-critical sites with 0.5-5k users p/m. Heard a | lot about ansible, will look into it, thanks! | ghaff wrote: | So it sounds as if you've sort of outgrown the command | line but aren't sure you want to jump in on self-managed | Kubernetes. You'd have to look at the costs but maybe | some sort of managed offering would work for you. It | could scale up for larger sites but would be fairly | simple for smaller ones--especially with standardized | configurations. | | You could look at the big cloud providers directly. | OpenShift [I work at Red Hat] also has a few different | types of managed offerings. | watermelon0 wrote: | I'm not necessarily agreeing with you. Kubernetes really is a | complex beast, and I wouldn't recommend self-hosting it for | companies that don't have people that can focus solely on | managing it. | | I would also not recommend it for hosting a WordPress site, or | a simple CRUD app. | | However, when you get to the level where autoscaling is | required, and where you are deploying multiple services, | managed Kubernetes is not such a bad idea. | | Using EKS (especially with Fargate) on AWS is not much harder | than figuring out and properly utilizing EC2/ASG/ELB. GKE or | DigitalOcean offerings seem to be even easier to use and | understand. | apple4ever wrote: | "more tenants or envs in shared cluster" | | This is what I'm trying to convince my current company about. | They want everything in a single cluster (prod, test, stage, qa). | | Of course self hosting makes this more difficult to justify, | since it is additional expenses for more machines. | LambdaB wrote: | Have you considered using OpenShift instead of Kubernetes? It | comes with vastly improved multitenancy features, as well as | other aspects, in regards to plain Kubernetes. OKD, the open | sourced package of OpenShift allows full self-hosting: | https://www.okd.io | moondev wrote: | That sounds like a disaster waiting to happen! | | Seems like a perfect use-case for Cluster API: https://cluster- | api.sigs.k8s.io/user/quick-start.html | | Have one global "mgmt cluster" with several workload clusters | haolez wrote: | ... requiredDuringSchedulingIgnoredDuringExecution: | ... | | This instantly remembered me of this: | https://thedailywtf.com/articles/the-longest-method | | Kubernetes sometimes shows its Java roots. | vsareto wrote: | That Java link in the article goes to a completely not-Java | Chinese site btw | tomnipotent wrote: | To be fair, I don't see how this could be shorter in any other | language without losing readability. | ithkuil wrote: | > its Java roots | | Citation needed. | | AFAIR Borg is implemented in C++ and k8s has been implemented | in Go from day 0. | | Am I missing some crucial steps in k8s's history? | detaro wrote: | Apparently the first version was based on a Java prototype, | but it's unclear to me how much that's visible: | | https://kubernetes.io/blog/2018/06/06/4-years-of-k8s/ | | > _Concretely, Kubernetes started as some prototypes from | Brendan Burns combined with ongoing work from me and Craig | McLuckie to better align the internal Google experience with | the Google Cloud experience. Brendan, Craig, and I really | wanted people to use this, so we made the case to build out | this prototype as an open source project that would bring the | best ideas from Borg out into the open._ | | > _After we got the nod, it was time to actually build the | system. We took Brendan's prototype (in Java), rewrote it in | Go, and built just enough to get the core ideas across_ | enitihas wrote: | I remember a comment from someone on the go team (bradfitz | maybe) saying that the initial plan for Kubernetes was to be | implemented in java, but go was chosen because it already had | a lot of momentum in the container world. ___________________________________________________________________ (page generated 2020-05-17 23:00 UTC)