[HN Gopher] De-cloud and de-k8s - bringing our apps back home ___________________________________________________________________ De-cloud and de-k8s - bringing our apps back home Author : mike1o1 Score : 113 points Date : 2023-03-22 16:12 UTC (6 hours ago) (HTM) web link (dev.37signals.com) (TXT) w3m dump (dev.37signals.com) | rad_gruchalski wrote: | Good for them. Now they have a one-off to manage themselves. It's | pretty easy to de-cloud using something like k3s. So much value | added in Kubernetes to leverage. But they have Chef and they're a | Ruby shop, I guess they'll be good. | | TBH, Kubernetes has some really rough edges. Helm charts aren't | that great and Kustomize gets real messy real fast. | acedTrex wrote: | This seems like an application/stack that didn't have a valid | need for k8s in the first place. Don't just use K8s because its | what people say you should do. evaluate the pros and the VERY | real cons and make an informed decision. | birdyrooster wrote: | "Need" Eh, I do it because it's awesome for a single box or | thousands. Single sign on, mTLS everywhere, cert-manager, BGP | or L2 VIPs to any pod, etc and I can expand horizontally as | needed. It's the best for an at home lab. I pity the people who | only use Proxmox. | bdcravens wrote: | That's why we've had good results with ECS. Feels like 80% of | the result for 20% of the effort, and I haven't found our use | cases needing that missing 20%. | sgarland wrote: | With EC2 launch types, probably. Setting up ECS for Fargate | with proper IaC templates/modules isn't much easier than EKS, | IMO. | brigadier132 wrote: | On the Google cloud side, using Google cloud build with | cloud run with automatic CI/CD is very straightforward. I | setup automated builds and deploys for staging in 2 hours. | For production I set it up to track tagged branches | matching a regex. | Aeolun wrote: | Mostly because CF and CDK have spawned from the deepest | pits of hell. It's ok when using terraform, and downright | pleasant when using Pulumi. | bdcravens wrote: | We use Fargate, and what we launch is tightly coupled to | our application (background jobs spin down and spin up | tasks via the SDK) so for now, we aren't doing anything | with IaC, other than CI deployment. | dustedcodes wrote: | If I was Netflix I would de-cloud, but if I was a small team like | 37signals then de-clouding is just insanity. I think DHH is | either very stupid or extremely naive in his cost calculations or | probably a mix of both. Hey and Basecamp customers will see many | issues in the next few years and hackers will feast off their on- | premise infrastructure. | mike1o1 wrote: | To me the really surprising thing is that they still use | Capistrano for deploying basecamp! | camsjams wrote: | Maybe because Capistrano is written in Ruby and the language | matches their internal products? That was my only guess. | mbreese wrote: | I was guessing that they kept using Capistrano because it | still worked. No need to change something that's working... | | (Somewhat of an ironic comment when talking about an article | about ditching K8s...) | Melingo wrote: | I'm so lost on so many choices this company did. | | You de cloud and now use some mini tool like mrskd? | | I'm running k8s on azure (small), gke (very big), rke2 on a setup | with 5 nodes and k3s. | | I'm totally lost why they would de k8s after investing already so | much time? They should be able to work with k8s really well at | this point. | | Sry to say but for me it feels the company has a much bigger | issue than cloud vs non cloud: no one with proper experience and | strategy. | Aeolun wrote: | I feel like this is the reason so many horrible kubernetes | stacks exist. | drewda wrote: | If one of the co-founders/owners is writing the devops tooling | from scratch... well, that's a decision. | | Not saying it's necessarily a bad decision. But it's | potentially driven more by personal interests than a | dispassionate and strategic plan. | chologrande wrote: | It does seem like they just moved all of their infra | components, and got rid of autoscaling. | | Load balancing, logging, and other associated components are | all still there. Almost nothing changed in the actual | architecture, just how it was hosted. | | I have a hard time seeing why this was beneficial. | slackfan wrote: | Cost, mostly. | jrockway wrote: | Those k8s license fees will get ya. | rad_gruchalski wrote: | You reckon they cannot afford running some vms for the k8s | control plane? | camsjams wrote: | Is this cheaper though? | | For a medium-to-large app, K8s should offset a lot of the | operational difficulties. Also you don't have to use K8s. | | Cloud is turn-on/turn-off, whereas on-premises you pay up front | investment. | | Here are all of the hidden costs of on-prem that folks forget | about when thinking about cloud being "expensive": | | - hardware | | - maintenance | | - electricity | | - air conditioning | | - security | | - on-call and incident response | | Here are all of the hidden time-consumers of on-prem that folks | forget about when thinking about cloud being "difficult": | | - os patching and maintenance | | - network maintenance | | - driver patching | | - library updating and maintenance | | - BACKUPS | | - redundancy | | - disaster recovery | | - availability | andrewstuart wrote: | This is the fiction that CTOs believe - "it's simply not | practical to run your own computers, you need cloud". | sn0wf1re wrote: | Every one of your examples in the second list is relevant to | both on-prem and cloud. Also cloud also has on-call, just not | for the hardware issues (still likely get a page for reduced | availability of your software). | vlunkr wrote: | "Just not for the hardware issues" is a huge deal though. | That's an entire skillset you can eliminate from your | requirements if you're only in the cloud. Depending on the | scale of your team this might be a massive amount of savings. | ilyt wrote: | Right. The skillset to pull the right drive from the server | and put replacement one. | | Says that you know nothing at all about actually running | hardware as the bigger problem is by far "the DC might be | drive 1-5 hour away" or "we have no spare parts at hand", | not "fiddling with server is super hard" | josho wrote: | The flip side is there is an entirely new skillset required | to successfully leverage the cloud. | | I suspect those cloud skills are also higher demand and | therefore more expensive than hiring for people to handle | hardware issues. | | Personally, I appreciate the contrarian view because I | think many businesses have been naive in their decision to | move some of their workloads into the cloud. I'd like to | see a broader industry study that shows what benefits are | actually realized in the cloud. | jrockway wrote: | At my last job, I would have happily gone into the office | at 3am to swap a hard drive if it meant I didn't have to | pay my AWS bill anymore. Computers are cheap. Backups are | annoying, but you have to do them in the cloud too. | (Deleting your Cloud SQL instance accidentally deletes all | the automatic backups; so you have to roll your own if you | care at all. Things like that; cloud providers remove some | annoyance, and then add their own. If you operate software | in production, you have to tolerate annoyance!) | | Self-managed Kubernetes is no picnic, but nothing | operational is ever a picnic. If it's not debugging a weird | networking issue with tcpdump while sitting on the | datacenter floor, it's begging your account rep for an | update on your ticket twice a day for 3 weeks. Pick your | poison. | ilyt wrote: | We have 7 racks, 3 people and actual hardware stuff is | minuscule part of that. Few hundred VMs, anything from "just a | software running on server" to k8s stack (biggest one is 30 | nodes), 2 ceph cluster (our and clients), and a bunch of other | shit | | The stuff you mentioned is, amortized, around 20% (automation | ftw). The rest of it is stuff that we would do in cloud anyway | and cloud is in general harder to debug too (we have few | smaller projects managed in cloud for customers. | | We did calculation to move to cloud few times now, never was | even close to profotable and we woudn't save on manpower anyway | as 24/7 on-call is still required. | | So I call bullshit on that. | | If you are startup, by all means go cloud | | If you are small, go ahead, not worth it. | | If you have spiky load, cloud or hybrid will most likely be | cheaper. | | But if you have constant (by that I mean difference between | peak and lowest traffic is "only" like 50-60%) load and need a | bunch of servers to run it (say 3+ racks), it might actually be | cheaper on-site. | | Or a bunch of dedicated servers. Then you don't need to bother | to manage hardware, and in case of boom can even scale | relatively quickly | Melingo wrote: | And don't get me wrong: whatever works for the company but k8s | experience alone is already super helpful. | | A lightweight k8s stack out of the box + argocd + cert manager is | like I fra steroids. | rektide wrote: | The whole kubernetes section of this writeup is two sentences. | They went with a vendor provided kube & it was expensive & | didn't go great. | | It just sounds like it was poorly executed, mostly? There's | enough blogs & YouTube of folk setting up HA k8s on a couple | rpi, & even the 2GB model works fine if having not-quite-half | the ram as overhead on apiservers/etcd nodes. | | It's not like 37signals has hundreds of teams & thousands of | services to juggle, so it' s not like they need a beefy control | plane. I dont know what went wrong & there's no real info to | guess by, but 37s seems like a semi-ideal easy lock for k8s on | prem. | ocdtrekkie wrote: | De-clouding is going to be a huge trend as companies are | pressured to save costs, and they realize on-prem is still a | fraction of the cost of comparable cloud services. | | This whole cloud shift has been one of the most mind-blowing | shared delusions in the industry, and I'm glad I've mostly | avoided working with it outright. | dopylitty wrote: | The thing that gets me about it is the very real physical cost | of all this cloud waste. | | The big cloud providers have clear cut thousands of acres in | Ohio, Northern VA, and elsewhere to build their huge windowless | concrete bunkers in support of this delusion of unlimited | scale. | | Hopefully as the monetary costs become clear their growth will | be reversed and these bunkers can be torn down | ocdtrekkie wrote: | For what it's worth, large providers will always need | datacenters. But perhaps datacenters run by public cloud | providers today will be sold off to larger businesses running | their own infrastructure someday at a discount. Most of the | infrastructure itself all will age out in five or ten years, | and would've been replaced either way. | | Heck, datacenters in Virginia are likely to end up being sold | directly to the federal government. | ilyt wrote: | They ain't going to be unused lmao. If migration happen they | will just stop building new ones or have to compete harder on | pricing. | adamsb6 wrote: | The big cloud providers are likely packing machines more | densely and powering them more efficiently than alternatives | like colos. | icedchai wrote: | On-prem has its own issues. Many small applications need little | more than a VPS and a sane backup/recovery strategy. | rr808 wrote: | Our firm started the big cloud initiative last year. We have | our own datacenters already, but all the cool startups used | cloud. Our managers figure it'll make us cool too. | erulabs wrote: | Only one sentence about why they chose to abandon K8s: | | > It all sounded like a win-win situation, but [on-prem | kubernetes] turned out to be a very expensive and operationally | complicated idea, so we had to get back to the drawing board | pretty soon. | | It was very expensive and operationally complicated to self-host | k8s, so they decided to _build their own orchestration tooling_? | Sort of undercuts their main argument that this bit isn 't even | remotely fleshed out. | benatkin wrote: | Well to be fair Kubernetes doesn't always pluralize the names | of collections, since you can run "kubectl get | deployment/myapp". You don't want to do the equivalent of | "select * from user" do you? That doesn't make any sense!!! And | don't translate that to "get all the records from the user | table"! That's "get all the records from the _users_ table ". | (Rails defaults to plural, Django to singular for table names. | Not sure about the equivalent for Kubernetes but in the CLI | surprisingly you can use either) | imiric wrote: | Sometimes there's value in building bespoke solutions. If you | don't need many of the features of the off-the-shelf solution, | and find the complexity overwhelming and the knowledge and | operational costs too high, then building a purpose-built | solution to fit your use case exactly can be very beneficial. | | You do need lots of expertise and relatively simple | applications to replace something like k8s, but 37signals seems | up for the task, and judging by the article, they picked their | least critical apps to start with. It sounds like a success | story so far. Kudos to them for releasing mrsk, it definitely | looks interesting. | | As a side note, I've become disgruntled at k8s becoming the | defacto standard for deploying services at scale. We need | different approaches to container orchestration, that do things | differently (perhaps even rethinking containers!), and focus on | simplicity and usability instead of just hyper scalability, | which many projects don't need. | | I was a fan of Docker Swarm for a long time, and still use it | at home, but I wouldn't dare recommend it professionally | anymore. Especially with the current way Docker Inc. is | managed. | mike1o1 wrote: | To be fair, from the article it says that they built the bulk | of the tool and did the first migration in a 6-week cycle. mrsk | looks fairly straight forward, and feels like Capistrano but | for containers. The first commit of mrsk is only on January 7th | of this year. | | _In less than a six-week cycle, we built those operational | foundations, shaped mrsk to its functional form and had | Tadalist running in production on our own hardware._ | bastawhiz wrote: | They spent a month and a half building tooling _capable of | handling their smallest application_, representing an | extremely tiny fraction of their cloud usage. | bastawhiz wrote: | I'm not going to put this down, because it sounds like they're | quite happy with the results. But they haven't written about a | few things that I find to be important details: | | First, one of the promises of a standardized platform (be it k8s | or something else) is that you don't reinvent the wheel for each | application. You have one way of doing logging, one way of doing | builds/deployments, etc. Now, they have two ways of doing | everything (one for their k8s stuff that remains in the cloud, | one for what they have migrated). And the stuff in the cloud is | the mature, been-using-it-for-years stuff, and the new stuff | seemingly hasn't been battle-tested beyond a couple small | services. | | Now that's fine, and migrating a small service and hanging the | Mission Accomplished banner is a win. But it's not a win that | says "we're ready to move our big, money-making services off of | k8s". My suspicion is that handling the most intensive services | means replacing all of the moving parts of k8s with lots of | k8s-shaped things, and things which are probably less-easily | glued together than k8s things are. | | Another thing that strikes me is that if you look at their cloud | spend [0], three of their four top services are _managed_ | services. You simply will not take RDS and swap it out 1:1 for | Percona MySQL, it is not the same for clusters of substance. You | will not simply throw Elasticsearch at some linux boxes and get | the same result as managed OpenSearch. You will not simply | install redis/memcached on some servers and get elasticache. The | managed services have substantial margin, but unless you have | Elasticsearch experts, memcached/redis experts, and DBAs on-hand | to make the thing do the stuff, you're also going to likely end | up spending more than you expect to run those things on hardware | you control. I don't think about SSDs or NVMe or how I'll | provision new servers for a sudden traffic spike when I set up an | Aurora cluster, but you can't not think about it when you're | running it yourself. | | Said another way, I'm curious as to how they will reduce costs | AND still have equally performant/maintainable/reliable services | while replacing some unit of infrastructure N with N+M (where M | is the currently-managed bits). And also while not being able to | just magically make more computers (or computers of a different | shape) appear in their datacenter at the click of a button. | | I'm also curious how they'll handle scaling. Is scaling your k8s | clusters up and down in the cloud really more expensive than | keeping enough machines to handle unexpected load on standby? I | guess their load must be pretty consistent. | | [0] https://dev.37signals.com/our-cloud-spend-in-2022/ | ethicalsmacker wrote: | If only Ruby had real concurrency and the memory didn't bloat | like crazy.... you wouldn't need 90% of the hardware. | sys_64738 wrote: | Larry Ellison was right. | not_enoch_wise wrote: | It's almost as if... | | ...kubernetes isn't the solution to every compute need... | birdyrooster wrote: | Except it was created to model virtually every solution to | every compute need. It's not about the compute itself, it's | about the taxonomy, composability, and verifiability of | specifications which makes Kubernetes excellent substrate for | nearly any computing model from the most static to the most | dynamic. You find kubernetes everywhere because of how flexible | it is to meet different domains. It's the next major revolution | in systems computing since Unix. | ranger207 wrote: | I (roughly) believe this as well[0], but more flexibility | generally means more complexity. Right now, if you don't need | the flexibility that k8s offers, it's probably better to use | a solution with less flexibility and therefore less | complexity. Maybe in a decade if k8s has eaten the world | there'll be simple k8s-based solutions to most problems, but | right now that's not always the case | | [0] I think that in the same way that operating systems | abstract physical hardware, memory management, process | management, etc, k8s abstracts storage, network, compute | resources, etc | aliasxneo wrote: | Always two extremes to any debate. I've personally enjoyed my | journey with it. I've even been in an anti-k8s company | running bare metal on the Hashi stack (won't be running back | to that anytime soon). I think the two categories I've seen | work best are either something lik ECS or serverless and | Kubernetes. | klardotsh wrote: | Tell that to the myriad of folks making their money off of | peddling it. You'd swear it were the only tool available based | on the hype circles (and how many hiring manager strictly look | for experience with it). | ilyt wrote: | I gotta say from dev perspective it is very convenient | solution. But I wouldn't recommend it to anyone that runs | anything less complex than "a few services in a database". | The _tens of minutes_ you save in writing deploy scrips will | be replaced by hours of figuring out how to do it k8s way. | | From ops perspective let's say I ran it from scratch (as in | "writing systemd units to run k8s daemons and setting up CA | to feed them", because back then there was not that much | reliable automation around deploying it) and the complexity | tax is insane. Yeah you can install some automation doing | that but if it ever breaks (and I've seen some breaking) good | fucking luck, non-veteran will have better chance with | reinstalling it from scratch. | illiarian wrote: | Cloud Native Landscape.... | https://landscape.cncf.io/images/landscape.pdf | | It's more than just peddlers at this point. There are | peddlers peddling to other peddlers, several layers deep. | lloydatkinson wrote: | I've been following their move to on premise with interest and | this was a great read. I'm curious how they are wiring up GitHub | actions with their on premise deployment. How are they doing | this? | | The best I can think of for my own project is to run one of the | self hosted GitHub actions runners on the same machine which | could then run an action to trigger running the latest docker | image. | | Without something like that you miss the nice instant push model | cloud gives you and you have to use the pull model of polling | some service regularly for newer versions. | asjkaehauisa wrote: | They could just use nomad and call it a day | electric_mayhem wrote: | We used to call this a "sudden outbreak of common sense" | mfer wrote: | I wonder how they are going to handle fault tolerance when | machines go offline? | | So as to avoid being paged in the middle of the night, I grew to | really like automation that keeps things online. | turtlebits wrote: | Their app is running on at least 2 machines, so the load | balancer takes care of it. | guilhas wrote: | How easier is mrsk vs k3s? | turtlebits wrote: | Looks like an apple vs oranges comparison. They seem to have a | low number of distinct services, so there isn't a real need for | k3s/k8s (ie orchestration), on the other hand, they need config | management. | bdcravens wrote: | I'm not sure if anyone other than 37Signals is using it at | scale yet, so you may get a better idea by looking at the docs | yourself. | | https://github.com/mrsked/mrsk | ilyt wrote: | Oh, YAML-based DSL to deploy stuff, how original! | | Now we only need template-based generator for those YAMLs and | we will have all worst practices for orchestration right | here, just like k8s + helm ___________________________________________________________________ (page generated 2023-03-22 23:00 UTC)