[HN Gopher] De-cloud and de-k8s - bringing our apps back home
       ___________________________________________________________________
        
       De-cloud and de-k8s - bringing our apps back home
        
       Author : mike1o1
       Score  : 113 points
       Date   : 2023-03-22 16:12 UTC (6 hours ago)
        
 (HTM) web link (dev.37signals.com)
 (TXT) w3m dump (dev.37signals.com)
        
       | rad_gruchalski wrote:
       | Good for them. Now they have a one-off to manage themselves. It's
       | pretty easy to de-cloud using something like k3s. So much value
       | added in Kubernetes to leverage. But they have Chef and they're a
       | Ruby shop, I guess they'll be good.
       | 
       | TBH, Kubernetes has some really rough edges. Helm charts aren't
       | that great and Kustomize gets real messy real fast.
        
       | acedTrex wrote:
       | This seems like an application/stack that didn't have a valid
       | need for k8s in the first place. Don't just use K8s because its
       | what people say you should do. evaluate the pros and the VERY
       | real cons and make an informed decision.
        
         | birdyrooster wrote:
         | "Need" Eh, I do it because it's awesome for a single box or
         | thousands. Single sign on, mTLS everywhere, cert-manager, BGP
         | or L2 VIPs to any pod, etc and I can expand horizontally as
         | needed. It's the best for an at home lab. I pity the people who
         | only use Proxmox.
        
         | bdcravens wrote:
         | That's why we've had good results with ECS. Feels like 80% of
         | the result for 20% of the effort, and I haven't found our use
         | cases needing that missing 20%.
        
           | sgarland wrote:
           | With EC2 launch types, probably. Setting up ECS for Fargate
           | with proper IaC templates/modules isn't much easier than EKS,
           | IMO.
        
             | brigadier132 wrote:
             | On the Google cloud side, using Google cloud build with
             | cloud run with automatic CI/CD is very straightforward. I
             | setup automated builds and deploys for staging in 2 hours.
             | For production I set it up to track tagged branches
             | matching a regex.
        
             | Aeolun wrote:
             | Mostly because CF and CDK have spawned from the deepest
             | pits of hell. It's ok when using terraform, and downright
             | pleasant when using Pulumi.
        
             | bdcravens wrote:
             | We use Fargate, and what we launch is tightly coupled to
             | our application (background jobs spin down and spin up
             | tasks via the SDK) so for now, we aren't doing anything
             | with IaC, other than CI deployment.
        
       | dustedcodes wrote:
       | If I was Netflix I would de-cloud, but if I was a small team like
       | 37signals then de-clouding is just insanity. I think DHH is
       | either very stupid or extremely naive in his cost calculations or
       | probably a mix of both. Hey and Basecamp customers will see many
       | issues in the next few years and hackers will feast off their on-
       | premise infrastructure.
        
       | mike1o1 wrote:
       | To me the really surprising thing is that they still use
       | Capistrano for deploying basecamp!
        
         | camsjams wrote:
         | Maybe because Capistrano is written in Ruby and the language
         | matches their internal products? That was my only guess.
        
           | mbreese wrote:
           | I was guessing that they kept using Capistrano because it
           | still worked. No need to change something that's working...
           | 
           | (Somewhat of an ironic comment when talking about an article
           | about ditching K8s...)
        
       | Melingo wrote:
       | I'm so lost on so many choices this company did.
       | 
       | You de cloud and now use some mini tool like mrskd?
       | 
       | I'm running k8s on azure (small), gke (very big), rke2 on a setup
       | with 5 nodes and k3s.
       | 
       | I'm totally lost why they would de k8s after investing already so
       | much time? They should be able to work with k8s really well at
       | this point.
       | 
       | Sry to say but for me it feels the company has a much bigger
       | issue than cloud vs non cloud: no one with proper experience and
       | strategy.
        
         | Aeolun wrote:
         | I feel like this is the reason so many horrible kubernetes
         | stacks exist.
        
         | drewda wrote:
         | If one of the co-founders/owners is writing the devops tooling
         | from scratch... well, that's a decision.
         | 
         | Not saying it's necessarily a bad decision. But it's
         | potentially driven more by personal interests than a
         | dispassionate and strategic plan.
        
         | chologrande wrote:
         | It does seem like they just moved all of their infra
         | components, and got rid of autoscaling.
         | 
         | Load balancing, logging, and other associated components are
         | all still there. Almost nothing changed in the actual
         | architecture, just how it was hosted.
         | 
         | I have a hard time seeing why this was beneficial.
        
           | slackfan wrote:
           | Cost, mostly.
        
             | jrockway wrote:
             | Those k8s license fees will get ya.
        
             | rad_gruchalski wrote:
             | You reckon they cannot afford running some vms for the k8s
             | control plane?
        
       | camsjams wrote:
       | Is this cheaper though?
       | 
       | For a medium-to-large app, K8s should offset a lot of the
       | operational difficulties. Also you don't have to use K8s.
       | 
       | Cloud is turn-on/turn-off, whereas on-premises you pay up front
       | investment.
       | 
       | Here are all of the hidden costs of on-prem that folks forget
       | about when thinking about cloud being "expensive":
       | 
       | - hardware
       | 
       | - maintenance
       | 
       | - electricity
       | 
       | - air conditioning
       | 
       | - security
       | 
       | - on-call and incident response
       | 
       | Here are all of the hidden time-consumers of on-prem that folks
       | forget about when thinking about cloud being "difficult":
       | 
       | - os patching and maintenance
       | 
       | - network maintenance
       | 
       | - driver patching
       | 
       | - library updating and maintenance
       | 
       | - BACKUPS
       | 
       | - redundancy
       | 
       | - disaster recovery
       | 
       | - availability
        
         | andrewstuart wrote:
         | This is the fiction that CTOs believe - "it's simply not
         | practical to run your own computers, you need cloud".
        
         | sn0wf1re wrote:
         | Every one of your examples in the second list is relevant to
         | both on-prem and cloud. Also cloud also has on-call, just not
         | for the hardware issues (still likely get a page for reduced
         | availability of your software).
        
           | vlunkr wrote:
           | "Just not for the hardware issues" is a huge deal though.
           | That's an entire skillset you can eliminate from your
           | requirements if you're only in the cloud. Depending on the
           | scale of your team this might be a massive amount of savings.
        
             | ilyt wrote:
             | Right. The skillset to pull the right drive from the server
             | and put replacement one.
             | 
             | Says that you know nothing at all about actually running
             | hardware as the bigger problem is by far "the DC might be
             | drive 1-5 hour away" or "we have no spare parts at hand",
             | not "fiddling with server is super hard"
        
             | josho wrote:
             | The flip side is there is an entirely new skillset required
             | to successfully leverage the cloud.
             | 
             | I suspect those cloud skills are also higher demand and
             | therefore more expensive than hiring for people to handle
             | hardware issues.
             | 
             | Personally, I appreciate the contrarian view because I
             | think many businesses have been naive in their decision to
             | move some of their workloads into the cloud. I'd like to
             | see a broader industry study that shows what benefits are
             | actually realized in the cloud.
        
             | jrockway wrote:
             | At my last job, I would have happily gone into the office
             | at 3am to swap a hard drive if it meant I didn't have to
             | pay my AWS bill anymore. Computers are cheap. Backups are
             | annoying, but you have to do them in the cloud too.
             | (Deleting your Cloud SQL instance accidentally deletes all
             | the automatic backups; so you have to roll your own if you
             | care at all. Things like that; cloud providers remove some
             | annoyance, and then add their own. If you operate software
             | in production, you have to tolerate annoyance!)
             | 
             | Self-managed Kubernetes is no picnic, but nothing
             | operational is ever a picnic. If it's not debugging a weird
             | networking issue with tcpdump while sitting on the
             | datacenter floor, it's begging your account rep for an
             | update on your ticket twice a day for 3 weeks. Pick your
             | poison.
        
         | ilyt wrote:
         | We have 7 racks, 3 people and actual hardware stuff is
         | minuscule part of that. Few hundred VMs, anything from "just a
         | software running on server" to k8s stack (biggest one is 30
         | nodes), 2 ceph cluster (our and clients), and a bunch of other
         | shit
         | 
         | The stuff you mentioned is, amortized, around 20% (automation
         | ftw). The rest of it is stuff that we would do in cloud anyway
         | and cloud is in general harder to debug too (we have few
         | smaller projects managed in cloud for customers.
         | 
         | We did calculation to move to cloud few times now, never was
         | even close to profotable and we woudn't save on manpower anyway
         | as 24/7 on-call is still required.
         | 
         | So I call bullshit on that.
         | 
         | If you are startup, by all means go cloud
         | 
         | If you are small, go ahead, not worth it.
         | 
         | If you have spiky load, cloud or hybrid will most likely be
         | cheaper.
         | 
         | But if you have constant (by that I mean difference between
         | peak and lowest traffic is "only" like 50-60%) load and need a
         | bunch of servers to run it (say 3+ racks), it might actually be
         | cheaper on-site.
         | 
         | Or a bunch of dedicated servers. Then you don't need to bother
         | to manage hardware, and in case of boom can even scale
         | relatively quickly
        
       | Melingo wrote:
       | And don't get me wrong: whatever works for the company but k8s
       | experience alone is already super helpful.
       | 
       | A lightweight k8s stack out of the box + argocd + cert manager is
       | like I fra steroids.
        
         | rektide wrote:
         | The whole kubernetes section of this writeup is two sentences.
         | They went with a vendor provided kube & it was expensive &
         | didn't go great.
         | 
         | It just sounds like it was poorly executed, mostly? There's
         | enough blogs & YouTube of folk setting up HA k8s on a couple
         | rpi, & even the 2GB model works fine if having not-quite-half
         | the ram as overhead on apiservers/etcd nodes.
         | 
         | It's not like 37signals has hundreds of teams & thousands of
         | services to juggle, so it' s not like they need a beefy control
         | plane. I dont know what went wrong & there's no real info to
         | guess by, but 37s seems like a semi-ideal easy lock for k8s on
         | prem.
        
       | ocdtrekkie wrote:
       | De-clouding is going to be a huge trend as companies are
       | pressured to save costs, and they realize on-prem is still a
       | fraction of the cost of comparable cloud services.
       | 
       | This whole cloud shift has been one of the most mind-blowing
       | shared delusions in the industry, and I'm glad I've mostly
       | avoided working with it outright.
        
         | dopylitty wrote:
         | The thing that gets me about it is the very real physical cost
         | of all this cloud waste.
         | 
         | The big cloud providers have clear cut thousands of acres in
         | Ohio, Northern VA, and elsewhere to build their huge windowless
         | concrete bunkers in support of this delusion of unlimited
         | scale.
         | 
         | Hopefully as the monetary costs become clear their growth will
         | be reversed and these bunkers can be torn down
        
           | ocdtrekkie wrote:
           | For what it's worth, large providers will always need
           | datacenters. But perhaps datacenters run by public cloud
           | providers today will be sold off to larger businesses running
           | their own infrastructure someday at a discount. Most of the
           | infrastructure itself all will age out in five or ten years,
           | and would've been replaced either way.
           | 
           | Heck, datacenters in Virginia are likely to end up being sold
           | directly to the federal government.
        
           | ilyt wrote:
           | They ain't going to be unused lmao. If migration happen they
           | will just stop building new ones or have to compete harder on
           | pricing.
        
           | adamsb6 wrote:
           | The big cloud providers are likely packing machines more
           | densely and powering them more efficiently than alternatives
           | like colos.
        
         | icedchai wrote:
         | On-prem has its own issues. Many small applications need little
         | more than a VPS and a sane backup/recovery strategy.
        
         | rr808 wrote:
         | Our firm started the big cloud initiative last year. We have
         | our own datacenters already, but all the cool startups used
         | cloud. Our managers figure it'll make us cool too.
        
       | erulabs wrote:
       | Only one sentence about why they chose to abandon K8s:
       | 
       | > It all sounded like a win-win situation, but [on-prem
       | kubernetes] turned out to be a very expensive and operationally
       | complicated idea, so we had to get back to the drawing board
       | pretty soon.
       | 
       | It was very expensive and operationally complicated to self-host
       | k8s, so they decided to _build their own orchestration tooling_?
       | Sort of undercuts their main argument that this bit isn 't even
       | remotely fleshed out.
        
         | benatkin wrote:
         | Well to be fair Kubernetes doesn't always pluralize the names
         | of collections, since you can run "kubectl get
         | deployment/myapp". You don't want to do the equivalent of
         | "select * from user" do you? That doesn't make any sense!!! And
         | don't translate that to "get all the records from the user
         | table"! That's "get all the records from the _users_ table ".
         | (Rails defaults to plural, Django to singular for table names.
         | Not sure about the equivalent for Kubernetes but in the CLI
         | surprisingly you can use either)
        
         | imiric wrote:
         | Sometimes there's value in building bespoke solutions. If you
         | don't need many of the features of the off-the-shelf solution,
         | and find the complexity overwhelming and the knowledge and
         | operational costs too high, then building a purpose-built
         | solution to fit your use case exactly can be very beneficial.
         | 
         | You do need lots of expertise and relatively simple
         | applications to replace something like k8s, but 37signals seems
         | up for the task, and judging by the article, they picked their
         | least critical apps to start with. It sounds like a success
         | story so far. Kudos to them for releasing mrsk, it definitely
         | looks interesting.
         | 
         | As a side note, I've become disgruntled at k8s becoming the
         | defacto standard for deploying services at scale. We need
         | different approaches to container orchestration, that do things
         | differently (perhaps even rethinking containers!), and focus on
         | simplicity and usability instead of just hyper scalability,
         | which many projects don't need.
         | 
         | I was a fan of Docker Swarm for a long time, and still use it
         | at home, but I wouldn't dare recommend it professionally
         | anymore. Especially with the current way Docker Inc. is
         | managed.
        
         | mike1o1 wrote:
         | To be fair, from the article it says that they built the bulk
         | of the tool and did the first migration in a 6-week cycle. mrsk
         | looks fairly straight forward, and feels like Capistrano but
         | for containers. The first commit of mrsk is only on January 7th
         | of this year.
         | 
         |  _In less than a six-week cycle, we built those operational
         | foundations, shaped mrsk to its functional form and had
         | Tadalist running in production on our own hardware._
        
           | bastawhiz wrote:
           | They spent a month and a half building tooling _capable of
           | handling their smallest application_, representing an
           | extremely tiny fraction of their cloud usage.
        
       | bastawhiz wrote:
       | I'm not going to put this down, because it sounds like they're
       | quite happy with the results. But they haven't written about a
       | few things that I find to be important details:
       | 
       | First, one of the promises of a standardized platform (be it k8s
       | or something else) is that you don't reinvent the wheel for each
       | application. You have one way of doing logging, one way of doing
       | builds/deployments, etc. Now, they have two ways of doing
       | everything (one for their k8s stuff that remains in the cloud,
       | one for what they have migrated). And the stuff in the cloud is
       | the mature, been-using-it-for-years stuff, and the new stuff
       | seemingly hasn't been battle-tested beyond a couple small
       | services.
       | 
       | Now that's fine, and migrating a small service and hanging the
       | Mission Accomplished banner is a win. But it's not a win that
       | says "we're ready to move our big, money-making services off of
       | k8s". My suspicion is that handling the most intensive services
       | means replacing all of the moving parts of k8s with lots of
       | k8s-shaped things, and things which are probably less-easily
       | glued together than k8s things are.
       | 
       | Another thing that strikes me is that if you look at their cloud
       | spend [0], three of their four top services are _managed_
       | services. You simply will not take RDS and swap it out 1:1 for
       | Percona MySQL, it is not the same for clusters of substance. You
       | will not simply throw Elasticsearch at some linux boxes and get
       | the same result as managed OpenSearch. You will not simply
       | install redis/memcached on some servers and get elasticache. The
       | managed services have substantial margin, but unless you have
       | Elasticsearch experts, memcached/redis experts, and DBAs on-hand
       | to make the thing do the stuff, you're also going to likely end
       | up spending more than you expect to run those things on hardware
       | you control. I don't think about SSDs or NVMe or how I'll
       | provision new servers for a sudden traffic spike when I set up an
       | Aurora cluster, but you can't not think about it when you're
       | running it yourself.
       | 
       | Said another way, I'm curious as to how they will reduce costs
       | AND still have equally performant/maintainable/reliable services
       | while replacing some unit of infrastructure N with N+M (where M
       | is the currently-managed bits). And also while not being able to
       | just magically make more computers (or computers of a different
       | shape) appear in their datacenter at the click of a button.
       | 
       | I'm also curious how they'll handle scaling. Is scaling your k8s
       | clusters up and down in the cloud really more expensive than
       | keeping enough machines to handle unexpected load on standby? I
       | guess their load must be pretty consistent.
       | 
       | [0] https://dev.37signals.com/our-cloud-spend-in-2022/
        
       | ethicalsmacker wrote:
       | If only Ruby had real concurrency and the memory didn't bloat
       | like crazy.... you wouldn't need 90% of the hardware.
        
       | sys_64738 wrote:
       | Larry Ellison was right.
        
       | not_enoch_wise wrote:
       | It's almost as if...
       | 
       | ...kubernetes isn't the solution to every compute need...
        
         | birdyrooster wrote:
         | Except it was created to model virtually every solution to
         | every compute need. It's not about the compute itself, it's
         | about the taxonomy, composability, and verifiability of
         | specifications which makes Kubernetes excellent substrate for
         | nearly any computing model from the most static to the most
         | dynamic. You find kubernetes everywhere because of how flexible
         | it is to meet different domains. It's the next major revolution
         | in systems computing since Unix.
        
           | ranger207 wrote:
           | I (roughly) believe this as well[0], but more flexibility
           | generally means more complexity. Right now, if you don't need
           | the flexibility that k8s offers, it's probably better to use
           | a solution with less flexibility and therefore less
           | complexity. Maybe in a decade if k8s has eaten the world
           | there'll be simple k8s-based solutions to most problems, but
           | right now that's not always the case
           | 
           | [0] I think that in the same way that operating systems
           | abstract physical hardware, memory management, process
           | management, etc, k8s abstracts storage, network, compute
           | resources, etc
        
           | aliasxneo wrote:
           | Always two extremes to any debate. I've personally enjoyed my
           | journey with it. I've even been in an anti-k8s company
           | running bare metal on the Hashi stack (won't be running back
           | to that anytime soon). I think the two categories I've seen
           | work best are either something lik ECS or serverless and
           | Kubernetes.
        
         | klardotsh wrote:
         | Tell that to the myriad of folks making their money off of
         | peddling it. You'd swear it were the only tool available based
         | on the hype circles (and how many hiring manager strictly look
         | for experience with it).
        
           | ilyt wrote:
           | I gotta say from dev perspective it is very convenient
           | solution. But I wouldn't recommend it to anyone that runs
           | anything less complex than "a few services in a database".
           | The _tens of minutes_ you save in writing deploy scrips will
           | be replaced by hours of figuring out how to do it k8s way.
           | 
           | From ops perspective let's say I ran it from scratch (as in
           | "writing systemd units to run k8s daemons and setting up CA
           | to feed them", because back then there was not that much
           | reliable automation around deploying it) and the complexity
           | tax is insane. Yeah you can install some automation doing
           | that but if it ever breaks (and I've seen some breaking) good
           | fucking luck, non-veteran will have better chance with
           | reinstalling it from scratch.
        
           | illiarian wrote:
           | Cloud Native Landscape....
           | https://landscape.cncf.io/images/landscape.pdf
           | 
           | It's more than just peddlers at this point. There are
           | peddlers peddling to other peddlers, several layers deep.
        
       | lloydatkinson wrote:
       | I've been following their move to on premise with interest and
       | this was a great read. I'm curious how they are wiring up GitHub
       | actions with their on premise deployment. How are they doing
       | this?
       | 
       | The best I can think of for my own project is to run one of the
       | self hosted GitHub actions runners on the same machine which
       | could then run an action to trigger running the latest docker
       | image.
       | 
       | Without something like that you miss the nice instant push model
       | cloud gives you and you have to use the pull model of polling
       | some service regularly for newer versions.
        
       | asjkaehauisa wrote:
       | They could just use nomad and call it a day
        
       | electric_mayhem wrote:
       | We used to call this a "sudden outbreak of common sense"
        
       | mfer wrote:
       | I wonder how they are going to handle fault tolerance when
       | machines go offline?
       | 
       | So as to avoid being paged in the middle of the night, I grew to
       | really like automation that keeps things online.
        
         | turtlebits wrote:
         | Their app is running on at least 2 machines, so the load
         | balancer takes care of it.
        
       | guilhas wrote:
       | How easier is mrsk vs k3s?
        
         | turtlebits wrote:
         | Looks like an apple vs oranges comparison. They seem to have a
         | low number of distinct services, so there isn't a real need for
         | k3s/k8s (ie orchestration), on the other hand, they need config
         | management.
        
         | bdcravens wrote:
         | I'm not sure if anyone other than 37Signals is using it at
         | scale yet, so you may get a better idea by looking at the docs
         | yourself.
         | 
         | https://github.com/mrsked/mrsk
        
           | ilyt wrote:
           | Oh, YAML-based DSL to deploy stuff, how original!
           | 
           | Now we only need template-based generator for those YAMLs and
           | we will have all worst practices for orchestration right
           | here, just like k8s + helm
        
       ___________________________________________________________________
       (page generated 2023-03-22 23:00 UTC)