[HN Gopher] Treat Kubernetes clusters as cattle, not pets
       ___________________________________________________________________
        
       Treat Kubernetes clusters as cattle, not pets
        
       Author : mffap
       Score  : 110 points
       Date   : 2021-06-30 13:24 UTC (9 hours ago)
        
 (HTM) web link (zitadel.ch)
 (TXT) w3m dump (zitadel.ch)
        
       | mrweasel wrote:
       | While I don't disagree, this is also a reminder that debugging
       | Kubernetes can be terribly complicated.
        
       | ulzeraj wrote:
       | I want to go back in time when naming your servers as X-men
       | characters or Dune houses was a thing. I'm not a big fan of this
       | brave new DevOps world.
        
         | exdsq wrote:
         | My first job, tech support, had fantasy server names. That was
         | fun :)
        
       | psanford wrote:
       | > It means that we'd rather just replace a "sick" instance by a
       | new healthy one than taking it to a doctor.
       | 
       | This analogy really bothers me. Cattle are expensive. They are an
       | investment. You don't put down an investment just because it got
       | sick.
       | 
       | If you have a sick cow you will in-fact call your local large
       | animal vet to come and treat it.
        
         | ska wrote:
         | Yes that statement is wrong. I guess the real difference in the
         | pet/cow calculus is for cattle you probably won't pay more for
         | treatment than the cow is worth; with pets people do this all
         | the time.
        
           | psanford wrote:
           | Yup. If the argument is "don't fall in love with your
           | servers", I'm in agreement.
           | 
           | However, the idea that whenever anything weird happens you
           | should just kill you server/cluster and move on without doing
           | any sort of investigation seems like a recipe for disaster.
           | That's a great way to mask bugs that may in-fact be systemic
           | in nature, where they are or will eventually cause service
           | degradation for your customers.
           | 
           | I would hate to work in an environment where bugs are ignored
           | and worked around instead of understood and fixed.
        
             | lamontcg wrote:
             | I like the idea of killing it and moving on without paging
             | someone at 2AM in the middle of the night.
             | 
             | Ideally that goes into an async queue of issues though and
             | someone finds the root cause and that goes into a queue of
             | issues to fix, which is actually burned down.
             | 
             | I suspect what is happening more often is that the whole
             | stack has so many levels and the SREs responsible for it
             | all don't have the visibility into the stack they need to
             | debug it all, so they use their SLOs as a club to ignore
             | issues as long as they're meeting their metrics until it
             | becomes a firefighting drill.
             | 
             | A pile of cargo culted best practices and SLOs replacing
             | hands on debugging.
        
         | pyuser583 wrote:
         | Don't let facts get in the way of a good metaphor.
         | 
         | There's nothing remotely disgusting about making sausages, but
         | people say "Laws are like sausages - it's better not to see
         | them being made."
         | 
         | I stopped correcting them long ago.
        
       | coding123 wrote:
       | AWS in my mind can quickly lose the kubernetes war amongst cloud
       | providers. This is every cloud providers chance: EKS on AWS is so
       | damn tied into a bunch of other AWS products that it's literally
       | impossible to just delete a cluster now. I tried. It's tied into
       | VPCs and Subnets and EC2 and Load balancers and a bunch of other
       | products that no longer makes sense now that K8s won.
       | 
       | In my opinion it needs to be re-engineered completely into a
       | super slim product that is not tied to all these crazy things.
        
         | ffo wrote:
         | You mean EKS needs re-engineering?
        
         | kinghajj wrote:
         | Also the fact that a new EKS cluster takes at least 20 minutes
         | to come up and be ready, makes AWS' offering the weakest among
         | the Big Three cloud vendors.
        
           | jen20 wrote:
           | This isn't true. I've provisioned 53 EKS clusters this week,
           | and every one of them has been up in under 11 minutes with
           | all of the accoutrements. I understand it has been
           | substantially slower in the past.
        
             | nickjj wrote:
             | 11 minutes is still a long time IMO.
             | 
             | You can spin up a local multi-node cluster using kind[0] in
             | 1 minute on 6+ year old hardware. I know it's not the same
             | but I really have to imagine there's ways to speed this up
             | on the cloud. I haven't spun up a cluster on DO or Linode
             | in a while, does anyone know how long it takes there for a
             | comparison?
             | 
             | [0]: https://kind.sigs.k8s.io/
        
               | mlnj wrote:
               | I think it is unfair to compare KinD with an actual
               | Kubernetes cluster which comes with Load Balancers,
               | External IP addresses etc.
               | 
               | My Terraform scripts get a HA K3s cluster in Google Cloud
               | VMs in less than 9 minutes, which in my opinion is
               | fantastic.
        
             | misiti3780 wrote:
             | Are you using terraform ?
        
           | hodgesrm wrote:
           | It really depends on the use case. We use Kubernetes for
           | hosting managed data warehouses. EKS cluster spin-up time is
           | a one-time cost to start a new environment for data
           | warehouses. It's insignificant compared to the time required
           | to load data. Other issues like auto-scaling performance are
           | more important, at least in our case.
        
           | hughrr wrote:
           | Also the base cluster cost of $0.10 an hour really hurts for
           | test and experimentation.
        
             | jrockway wrote:
             | GCP charges the same now (though your first one is free). I
             | don't think it's an unreasonable cost, but it's certainly
             | not competitive with places like Digital Ocean which are
             | still free.
        
         | atmosx wrote:
         | Not sure. I'm using DigitalOcean Managed Version and have
         | experience with EKS. There are pros and cons on both sides.
         | What I like about EKS is support for API and k8s logs
         | (accounting / security), coupled with CloudTrail and IAM
         | integration for RBAC/user management is a bliss.
        
         | misiti3780 wrote:
         | I am about to move our infrastructure from ECS to EKS, what
         | other AWS products is EKS tied to?
        
         | k__ wrote:
         | I don't have the impression AWS even sees it as a war worth
         | fighting.
         | 
         | And I can't blame them.
        
         | unfunco wrote:
         | It's not literally impossible to delete a cluster, I do this
         | many times daily fully automated with no issues, and an EKS
         | cluster is not tied to a VPC or subnets, you can spin them up
         | independently, and you can delete your clusters without
         | affecting the VPC or subnets in any way.
         | 
         | An Ingress or a service of type LoadBalancer will create a load
         | balancer in AWS that's tied to your cluster, but that's the
         | whole point of Kubernetes, it'll spin up the equivalent
         | resources in Azure or GCP or DigitalOcean.
        
       | aliswe wrote:
       | > ... the GitOps pattern is gaining adoption for Kubernetes
       | workload deployment.
       | 
       | Is it really though? I for one am glad I didnt jump on the
       | bandwagon early. A lot of the articles popping up nowadays
       | mentioning the downsides of GitOps make a lot of sense.
        
       | tw600040 wrote:
       | This idea that cattles are meant to be slaughtered and can't be
       | pets is extremely offensive.
        
         | sweetheart wrote:
         | Preach. It's a metaphor that enforces harmful beliefs, like
         | master/slave terminology.
         | 
         | Our language defines our world, so we should aim to shy away
         | from language that amplifies a harmful message, even in
         | radically different contexts.
        
           | zachrose wrote:
           | I'd like to see a vegan alternative to the pets/cattle meme.
        
             | Zababa wrote:
             | Fungible vs non-fungible seem to be the concept hiding
             | behind pets/cattle. You can substitute a dollar for
             | another, you can't substitute your favorite rock for
             | another.
        
             | sweetheart wrote:
             | Screen printing, not painting.
        
       | mfer wrote:
       | If I've read the Google papers on borg right (Kubernetes is
       | conceptually borg v3 with omega being v2) this is different from
       | how Google runs the things.
       | 
       | They'll do warehouse scale computing with borg operating large
       | clusters. borg is at the bottom.
       | 
       | The workloads spanning dev, test, and prod then run on these
       | clusters. By having large clusters with lots of things running on
       | them they get high utilization of the hardware and need less
       | hardware.
       | 
       | It's amusing to see k8s used in such a different way and one that
       | often uses a lot more hardware while driving up costs. Concepts
       | Google used to lower the cost.
       | 
       | Or, maybe I read the papers and book wrong.
       | 
       | I like the idea of higher utilization and better efficiency
       | because it uses less resources which is more green.
        
         | closeparen wrote:
         | Exactly. We practice this with Mesos. The point is that a
         | central infrastructure team maintains the (regional) clusters
         | and provides an interface for application teams to submit the
         | services. Each application team maintaining its own cluster or
         | dealing directly with the full power of the cluster scheduler
         | ecosystem would defeat the purpose.
        
         | atmosx wrote:
         | > It's amusing to see k8s used in such a different way and one
         | that often uses a lot more hardware while driving up costs.
         | 
         | To be fair, google runs its own datacenters with teams doing
         | research & optimisations on all levels of the stack: hardware
         | and software and most importantly an amount of engineering
         | resources converging to infinity.
         | 
         | The rest are stuck with VMs that share network interfaces and
         | have to monitor CPU steal, understand complex pricing models,
         | etc. Engineering resources are scarce so most companies will
         | over-provision just to be safe and because profiling the
         | application and fixing that API call that takes too long is
         | expensive, will spin-up another 50 pods.
        
         | bambambazooka wrote:
         | Could you please name the papers and the book?
        
           | q3k wrote:
           | Not GP, but:
           | 
           | https://research.google/pubs/pub43438/
           | 
           | https://sre.google/sre-book/production-environment/
           | 
           | are the best public documentation entrypoints that I know of.
        
         | q3k wrote:
         | > Or, maybe I read the papers and book wrong.
         | 
         | No, that's exactly how it works. You have clusters spanning a
         | datacenter failure domain (~= an AZ), and everything from prod
         | to dev workloads runs on there, with low priority batch jobs
         | bringing up the resource utilization to a sensible level.
         | 
         | You can do the same thing with k8s, you just have to trust its
         | multitenancy support. You have RBAC, priority, quotas,
         | preemption, pod security policies, network policies... Use
         | them. You can even force some workloads to use gVisor or
         | separate prod and dev workloads on different worker machines.
        
         | conradev wrote:
         | I've also see Kubernetes being used this way, with one cluster
         | per data center for company-wide utilization with segmentation
         | being at the _namespace_ layer. The priority class system is
         | heavily utilized to make sure production workloads are always
         | running, and other workloads are pre-empted as needed.
        
         | pyuser583 wrote:
         | Sounds like a Mesos-style arrangement.
        
         | fulafel wrote:
         | How do they do version upgrades, isn't that the traditional
         | Achilles heel of K8s that leads people to want to frequently
         | recreate clusters from scratch and/or do blue/green?
        
           | mfer wrote:
           | First, if I understand it right... Google does some smart
           | things in upgrades. They do things like tests and then
           | upgrade their equivalent to AZs in a data center. I'm sure
           | there have been upgrades gone bad that they've had to fix.
           | 
           | Kubernetes can be upgraded. I've watched nicely done upgrades
           | happening 4 or 5 years ago. I've watched simple upgrades
           | happen more recently. It's not unheard of. Even in public
           | clouds I've upgrade Kubernetes through many minor versions
           | without issue.
           | 
           | I would argue it's more work to create more clusters. You
           | need to migrate workloads and anything pointing to them. It
           | also would cost more as you have to run more hardware.
        
             | [deleted]
        
             | dharmab wrote:
             | We have active clusters that have been continually updated
             | since 1.13 or so.
             | 
             | Of course it is the cluster of Theseus since every single
             | bit of compute has been entirely replaced :)
        
           | moondev wrote:
           | Highly available kubeadm clusters are designed to be upgraded
           | in place. The Kubernetes api-server is also designed to
           | function with a minor version skew (for example v1.19.x and
           | v1.20.x) that would happen during an in-place upgrade.
           | 
           | Cluster API takes the above and can in-place upgrade clusters
           | for you. It's pretty awesome to see first hand. Bumping the
           | Kubernetes version string and machine image reference can
           | upgrade and even swap distros of a running cluster with no
           | down-time.
        
             | dharmab wrote:
             | One caveat is that downgrading the apiserver is not
             | guaranteed to be possible, since the schema of some types
             | in the API may have been migrated to newer versions in the
             | etcd database that the previous version may not be able to
             | read. There are tools such as Velero (https://velero.io)
             | which can restore a previous snapshot, but you will likely
             | incur downtime and lose any changes since the snapshot.
        
           | lokar wrote:
           | Nothing important is single homed to one cluster. Much
           | emphasis is placed on preventing correlated failed of
           | clusters.
        
           | jeffbee wrote:
           | The borg concepts and interfaces have been the same for ages.
           | The borglet and borgmaster get released and pushed out very
           | frequently (daily or weekly) and it doesn't break anything
           | among the workloads. There are not maintenance windows for
           | these changes because containers can run while the borglet
           | restarts, and borgmasters are protocol/format-compatible so
           | the new release binary simply joins the quorum. Machines also
           | get rotated out of the borg cell regularly for kernel
           | upgrades, way more often than I've seen outside of Google.
           | 
           | I think an important thing to know about K8s is Omega failed
           | to replace Borg, and then the Omega people created K8s. So
           | K8s does not necessarily descend from Borg, and not all of
           | Borg's desirable attributes made it into K8s.
        
             | dekhn wrote:
             | The omega folks were on the borgmaster and borglet teams
             | when they were building omega (I was on the borg team at
             | the same time, but working on a different project). It's
             | fair to say that k8s is an intellectual inheritor of the
             | parts of borg that are required for it to be useful on the
             | outside.
             | 
             | Borglet/borgmaster releases definitely break workloads. I
             | recall one where something was rolled out to all the test
             | clusters, passed all the tests (except one of ours) and was
             | about to be promoted to increasing percents of the prod
             | clusters. Whilst debugging why our test (which is not part
             | of the feature rollout) broke we realize that if this had
             | rolled out to prod, it would have broken all Tensorflow
             | jobs, and would have been a major OMG.
             | 
             | But yeah, most of the time, the release process for borglet
             | and borgmaster is fairly fast and fairly reliable.
        
               | jeffbee wrote:
               | > required for it to be useful on the outside.
               | 
               | A big part of this is that outside Google, the number of
               | people who have to _operate_ k8s as a fraction of all k8s
               | users is way higher than the fraction of borg users who
               | have to operate borg, so there 's a lot of stuff in k8s
               | that is 'end user experience' comforts and affordances
               | for operators.
        
           | q3k wrote:
           | If you don't depend on clusters being 100% available all the
           | time and design your applications to handle cluster-wide
           | outages (which you need to do at Google scale anyway), then
           | simply doing progressive rollouts across clusters is good
           | enough. If a cluster rollout fails and knocks some workloads
           | offline, so be it. Just revert the rollout and figure out
           | what went wrong.
           | 
           | You can also throw in some smaller clusters that replicate
           | the 'main' clusters' software stack and have some workloads
           | whose downtime does not impact production jobs (not just dev
           | workloads, but replicas of prod workloads!). These can be the
           | earliest in your rollout pipeline and serve as an early stage
           | canary.
        
         | lifty wrote:
         | You're right, vanilla Kubernetes has a level of complexity that
         | starts paying off at a certain cluster level. But the wide
         | adoption of K8s also shows that people love the standardization
         | and API it offers for orchestrating workloads even if they
         | don't take advantage of its scaling capabilities.
         | 
         | My hope is that projects like k3s will manage to cover that
         | small scale spectrum of the market.
        
           | void_mint wrote:
           | I've intentionally ignored Kubernetes for a few years. Is it
           | worth looking into K3s instead?
        
             | antonvs wrote:
             | It depends what you're looking for.
             | 
             | One attraction of k3s is you can get a "real" production-
             | grade K8s running on a single machine, or a small cluster,
             | very quickly and easily. That can be great for learning,
             | development and testing, certainly.
             | 
             | K3s is actually targeted at "edge" scenarios where you
             | aren't running in a cloud, and don't have the ability
             | and/or desire to dynamically adjust the cluster size.
             | 
             | You can scale a k3s cluster easily enough manually, adding
             | or deleting nodes, but to get cluster autoscaling you'd
             | need to do environment-dependent work to make it automatic.
             | 
             | If you do want that, then things will likely be quite a bit
             | easier with a managed cluster like Google's GKE, or even
             | possibly an self-installed cluster using a tool like kops,
             | kubespray etc. (It's a while since I've used any of those,
             | so not sure which is best.) AWS EKS is another choice,
             | although its setup is a bit more complex and I wouldn't
             | really recommend it to someone getting started.
             | 
             | And for production scenarios, integration with cloud load
             | balancers, network environment etc. is similarly going to
             | be easier with a provider-managed cluster. It's all
             | possible to do with k3s, but it's more work and a steeper
             | learning curve.
        
             | koeng wrote:
             | I've really enjoyed developing on K3s, would recommend.
        
         | andrewstuart2 wrote:
         | And furthermore, the primary problem with scaling out the
         | number of clusters is that it hamstrings one of the primary
         | value propositions of kubernetes: increased utilization. The
         | scheduler can't do its job without yet another scheduler on top
         | if you spread your workloads outside of their sphere of
         | influence.
        
       | deeblering4 wrote:
       | All this does is make me want to go vegan and avoid maintaining
       | the entire k8s farm.
       | 
       | Truly, if your software team headcount is under 500 why are you
       | running k8s?
        
         | exdsq wrote:
         | I'm in a team of 1 but using it because my product is based on
         | spinning up dedicated services for users on demand, so it works
         | well
        
         | mschuster91 wrote:
         | It gives you automatic failover and decent-ish (at least when
         | coupled with Rancher, naked k8s is nuts) management compared to
         | a couple of manually (or Puppet) managed servers.
         | 
         | A well implemented mini cluster can and will save you so much
         | time in later maintenance and deployments.
        
       | LAC-Tech wrote:
       | Raising cattle is a lot of work. You have to weigh them
       | regularly, apply treatments for intestinal worms, for lice, move
       | them from pasture to pasture so they don't overgraze. It's a
       | fulltime job.
       | 
       | Also if a cow dies, people don't just buy a new one. It
       | represents quite a loss of profit. Also represents a big
       | potential problem on the farm that people will want to resolve -
       | they're you're money makers, if they're dying it's an issue.
        
       | [deleted]
        
       | [deleted]
        
       | rllin wrote:
       | really you should treat your entire cloud deployment (sans state
       | where impossible) as cattle
        
       | jeffbee wrote:
       | Running a separate cluster for every service assures high
       | overhead and poor utilization. Fine if you can afford it, but be
       | aware that you are paying it.
        
         | q3k wrote:
         | Yeah, especially in production bare metal clusters. If you want
         | N+2 redundancy that's at least 5 physical machines for just the
         | control plane (etcd & apiservers), more if you don't want to
         | colocate worker nodes with that...
         | 
         | Even if you have full bare metal automation and thousands of
         | machines that seems like unnecessary waste.
        
           | dharmab wrote:
           | Add the cost of administrative services: DNS resolvers,
           | ingress controllers, log forwarders, monitoring (e.g.
           | Prometheus, some exporters), autoscalers, tracing
           | infrastructure, admission controllers, backups/disaster
           | recovery tools (e.g. Velero)...
           | 
           | It can add up to millions per year if you aren't auditing
           | your costs.
        
         | dijit wrote:
         | Yeah, that's insane as a concept. One of the larger selling
         | points of Kubernetes was bin packing. Removing that selling
         | point leaves you with...
         | 
         | * Orchestration of jobs (restart, start); this can be achieved
         | easily without the complexity of k8s
         | 
         | * Sidecar loading; Literally the easiest thing to do with
         | normal VMs.
         | 
         | and...?
        
           | dolni wrote:
           | Packing everything together is a "selling point" until you
           | find that a service can fill up ephemeral storage and take
           | down other services, or consume bandwidth without limit.
           | 
           | Let's not forget the potential security implications of not
           | keeping things properly isolated.
           | 
           | People who were around when provisioning on bare-metal was
           | still a thing already learned all these lessons. Somehow it
           | seems they have been forgotten by all the people driving hype
           | around Kubernetes.
        
             | q3k wrote:
             | > Packing everything together is a "selling point" until
             | you find that a service can fill up ephemeral storage and
             | take down other services, or consume bandwidth without
             | limit.
             | 
             | Ephemeral storage has resource requests/limits in pods.
             | 
             | Traffic shaping/limiting can be accomplished using
             | kubernetes.io/{ingress,egress}-bandwidth annotations. It's
             | not as nice as resources (because there's not quotas and
             | capacity planning, and it's generally very simplistic) but
             | you can still easily build on this.
             | 
             | Pods can also have priorities and higher priority workloads
             | can and will preempt lower priority workloads.
             | 
             | > Let's not forget the potential security implications of
             | not keeping things properly isolated.
             | 
             | For hardware isolation isolation, you can use gVisor or
             | even Kata containers.
             | 
             | > People who were around when provisioning on bare-metal
             | was still a thing already learned all these lessons.
             | Somehow it seems they have been forgotten by all the people
             | driving hype around Kubernetes.
             | 
             | Kubernetes explicitly aims to solve resource isolation. It
             | was built by people who have decades of experience solving
             | this exact problem in production, on bare metal, at scale.
             | Effectively, Kubernetes resource isolation is one of the
             | best solutions out there to easily, predictably and
             | strongly isolate resource between workloads _and_ maximize
             | utilization at the same time.
        
             | packetlost wrote:
             | Does ephemeral storage not have configurable limits? That
             | seems like quite the oversight if not.
        
             | dijit wrote:
             | Kubernetes has the concept of limits, especially on
             | ephemeral storage; additionally: if your node becomes
             | unhealthy then the workloads would be rescheduled on
             | another node.
             | 
             | I'm super not hypey about kubernetes, mostly because the
             | complexity surrounding networking is opaque and built on a
             | foundation of sand... But let's not argue things that
             | aren't true.
        
               | dolni wrote:
               | > Kubernetes has the concept of limits, especially on
               | ephemeral storage
               | 
               | So... why are these issues open?
               | 
               | https://github.com/kubernetes/enhancements/issues/1029
               | https://github.com/kubernetes/enhancements/issues/361
               | https://github.com/kubernetes/kubernetes/issues/54384
               | 
               | > additionally: if your node becomes unhealthy then the
               | workloads would be rescheduled on another node.
               | 
               | Well of course, but you're going to run into that issue
               | (likely) on all of the nodes where the offending service
               | lives.
               | 
               | > But let's not argue things that aren't true.
               | 
               | If what I've said is untrue, looking at open GitHub
               | issues and the Kubernetes documentation is certainly no
               | indication. That's a massive problem all by itself.
        
               | q3k wrote:
               | The first issue you've linked concerns quota support for
               | ephemeral storage requests/limits - which is not about
               | the limits themselves, but the ability to set a limit
               | quotas per tenant/namespace. Eg., team A cannot use a
               | total of more than 100G ephemeral storage in total in the
               | cluster. EDIT: No, sorry, it's about using underlying
               | filesystem quotas for limiting ephemeral storage, vs. the
               | current implementation, see the third point below. Also
               | see KEP: https://github.com/kubernetes/enhancements/tree/
               | master/keps/...
               | 
               | The second is a tracking issue for a KEP that has been
               | implemented but is still in alpha/beta. This will be
               | closed when all the related features are stable. There's
               | also some discussion about related functionality that
               | might be added as part of this KEP/design.
               | 
               | The third issue is about integrating Docker storage
               | quotas with Kubernetes ephemeral quotas - ie.,
               | translating ephemeral storage limits into disk quotas
               | (which would result in -ENOSPC to workloads), vs. the
               | standard kubelet implementation which just kills/evicts
               | workloads that run past their limit.
               | 
               | I agree these are difficult to understand if you're not
               | familiar with the k8s development/design process. I also
               | had to spend a few minutes on each one of them to
               | understand what the actual state of the issues is.
               | However, they're in a development issue tracker, and the
               | end-user k8s documentation clearly states that Ephemeral
               | Storage requests/limits works, how it works, and what its
               | limitations are:
               | https://kubernetes.io/docs/concepts/configuration/manage-
               | res...
        
             | fshbbdssbbgdd wrote:
             | Wait till you hear that AWS is renting you instances that
             | are running on the same metal as other customers.
        
               | dolni wrote:
               | You seem to be asserting that the ability of Linux
               | containers to isolate workloads is on par with virtual
               | machines.
               | 
               | That's just not the case.
        
           | Spooky23 wrote:
           | Remember in big companies the internal politics rule the day.
           | It's cheaper to buy more computers than to become the
           | overlord of computing.
        
         | ffo wrote:
         | You don't exactly need to run a cluster per service ;-) Instead
         | you can choose to collocate services who belong together and
         | form a ,,domain". But don't go the route and build the almighty
         | one Kubernetes cluster where all your domains run in one single
         | place.
        
         | eliodorro wrote:
         | Decreases utilization but also decreases coordination between
         | teams (no man-bear-pigs). Also weight the long-term costs of
         | poorly maintained platflorms and infrastructure in desaster
         | cases, security issues or when migrating to other providers.
         | 
         | High overhead can be automated away, google ORBOS.
        
       | yongjik wrote:
       | At this point, why not just drive it up to the logical
       | conclusion? Treat your business model as cattle, not pets.
       | Customers leaving? Fire up another business until capital runs
       | out, and if it does, no worries, jut hop to another job!
       | 
       | Sorry, but I feel like I landed in crazy-land. Kubernetes is
       | already an exercise in how many layers you can insert with nobody
       | understanding the whole picture. Ostensibly, it's so that you can
       | _isolate_ those fucking jobs so that different teams can run
       | different tasks in the same cluster without interfering with each
       | other. Hence namespaces, services, resource requirements, port
       | translation, autoscalers, and all those yaml files.
       | 
       | It boggles my mind that people look at something like Kubernetes
       | and decide "You know what? We need more layers. On top of this."
        
         | tyingq wrote:
         | _" You know what? We need more layers. On top of this."_
         | 
         | Heh. Service Mesh!
        
           | rantwasp wrote:
           | K8s on K8s! K64s!
        
             | voidfunc wrote:
             | But can it run on an N64?
        
             | yongjik wrote:
             | Kuberneteuberneteuberneteuberneteuberneteuberneteuberneteub
             | ernetes is a perfect name for the beautiful system you are
             | about to architect: it's an ancient Greek word for
             | helmsman-man-man-man-man-man-man-man, who is in charge of
             | helmsman-man-man-man-man-man-man, who is in charge of ...
             | 
             | (...sorry =_=)
        
         | spondyl wrote:
         | I mean, depending on the business, employees aren't trusted to
         | understand the whole picture regardless. Many employees at
         | traditional business don't even have administrator access on
         | their laptops, depending on their position so it's not
         | logically inconsistent with how things seem to operate. With
         | that lack of big picture overview, it makes things hard to
         | scrutinise since you're only seeing one sliver of an
         | implementation (ie; "It saves money" without seeing, let alone
         | understanding the technical requirements and vice versa)
        
           | aynyc wrote:
           | Limited admin practice isn't just to save money. It's a
           | security practice and it's a good one. In a large business,
           | no one understand the whole business maybe except legal
           | department and CEO.
        
             | MuffinFlavored wrote:
             | > In a large business, no one understand the whole business
             | maybe except legal department and CEO.
             | 
             | lol... what?
             | 
             | A CEO of a 20,000+ person company literally sees projects
             | as revenue sources and a deadline, nothing more.
             | 
             | To get it delivered, he/she walks down the chain of
             | managers until they gets answers. It's might as well be a
             | black hole.
        
       | rq1 wrote:
       | > It means that we'd rather just replace a "sick" instance by a
       | new healthy one than taking it to a doctor.
       | 
       | Oh god! Please treat your cattle better!
        
       | tnisonoff wrote:
       | When I worked at Asana, we created a small framework that allowed
       | for blue-green deployments of Kubernetes Clusters (and the apps
       | that lived on top of them) called KubeApps[0].
       | 
       | It worked out great for us -- upgrading Kubernetes was easy and
       | testable, never worried about code drift, etc.
       | 
       | [0] https://blog.asana.com/2021/02/kubernetes-at-asana/ (Not
       | written by me).
        
       | AtNightWeCode wrote:
       | Resource utilization is the main reason I would run a cluster in
       | the first place. Immutable infrastructure is also expensive to
       | build and maintain.
        
         | tnisonoff wrote:
         | For larger companies, I think a huge benefit of Kubernetes is
         | the shared language for defining and operating services, as
         | well as the well-thought-out abstractions for how these
         | services interact.
         | 
         | Costs are generally less of a concern, but having one way of
         | running, operating, and writing services allowed our dev team
         | to move faster, share knowledge, etc.
        
           | ffo wrote:
           | True, cost is not the biggest issue. Separation of teams with
           | different velocity and needs on the other hand is.
           | 
           | One API as abstraction with shared processes eases the pain
           | for the people relying on a platform.
        
       | jzelinskie wrote:
       | I like the idea of "Building on Quicksand" as the analogy for
       | Distributed Systems, but also maintaining your software
       | dependencies. This article basically recommends trying to
       | minimize your dependencies to keep reproducibility/portability
       | high. I generally agree with this, but also carry an "all things
       | within reason" mentality. But just as the article describes
       | coworkers growing into their cluster, the complexity of what they
       | run in their cluster will also grow over time and eventually
       | they'll realize they've just built up their own "distribution". A
       | few years ago, I've written a post asking people to think
       | critically when they hear someone mention "Vanilla"
       | Kubernetes[0].
       | 
       | The real problem they suffered is actually that Kubernetes isn't
       | fundamentally designed for multi-tenancy. Instead, you're forced
       | to make separate clusters to isolate different domains. Google
       | themselves run multiple Borg clusters to isolate different
       | domains, so it's natural that Kubernetes end up with a similar
       | design.
       | 
       | [0]: https://jzelinskie.com/posts/youre-not-running-vanilla-
       | kuber...
       | 
       | Disclosure: I worked as an engineer and product manager on CoreOS
       | Tectonic, the (now defunct) Kubernetes used in the post.
        
         | GauntletWizard wrote:
         | You're just wrong, unfortunately - Google runs dev and test and
         | prod all on the same clusters. Kubernetes multi-tenancy works
         | just fine, but the conventional definition of multi-tenancy
         | includes things like "network isolation" that are misguided.
         | Multi-tenancy should be set up (and is within Google) by
         | understanding what is and isn't shared with the environment,
         | and through cryptographic assertion of who you're speaking to.
         | If you want to see the latter part nicely integrated, come to a
         | SIGAUTH meeting and help me argue for it
        
           | jzelinskie wrote:
           | All I said was that they use separate clusters to "isolate
           | domains" which is a pretty vague description on purpose -- I
           | did not intend to claim they do it for different deployment
           | environments as you've described.
           | 
           | It's fairly subjective what types of isolation define "multi-
           | tenancy" which is why there hasn't been progress made despite
           | SIG and WG efforts in the past. While you do not believe
           | network isolation should be included, there are plenty of
           | developers working on OpenShift Online which may disagree.
           | OSO lets anyone on the internet sign up for free and
           | instantly be given their own namespace on a shared cluster
           | full of untrusted actors.
        
           | q3k wrote:
           | > If you want to see the latter part nicely integrated, come
           | to a SIGAUTH meeting and help me argue for it.
           | 
           | Invitation accepted! :) I've been dying to see an ALTS-like
           | [1] thing that works with Kubernetes. I really should be able
           | to talk encrypted and authenticated gRPC-to-gRPC without ever
           | having to set up secrets or manually provision certificates,
           | dammit.
           | 
           | [1] - https://cloud.google.com/security/encryption-in-
           | transit/appl...
        
         | ffo wrote:
         | Hey we used tectonic ;-) was a great tool at that time.
         | Tectonic did influence some of the concepts around ORBOS. Just
         | think of Tectonic combined with GitOps, minus the iPXE part.
         | 
         | Disclaimer: I am working with ORBOS
        
       ___________________________________________________________________
       (page generated 2021-06-30 23:00 UTC)