[HN Gopher] Moving from AWS to Bare-Metal saved us $230k per year ___________________________________________________________________ Moving from AWS to Bare-Metal saved us $230k per year Author : devneelpatel Score : 171 points Date : 2023-11-16 19:54 UTC (3 hours ago) (HTM) web link (blog.oneuptime.com) (TXT) w3m dump (blog.oneuptime.com) | not_your_vase wrote: | How are such savings not obvious after putting the amounts in an | Excel sheet, and spending an hour over it (and most importantly | doing this _before_ spending half a million /year on AWS)? | Spivak wrote: | I would be surprised if people didn't know that coloing was | cheaper. I certainly evangelize it for workloads that are | particularly expensive on AWS. | | It's not entirely without downsides though and I think many | shops are willing to pay more for a different set of them. It | is incredibly rewarding work though. You get to do magic. | | * You do need more experienced people, there's no way around it | and the skills are hard to come by sometimes. We spent probably | 3 years looking to hire a senior dba before we found one. | Networking people are also unicorns. | | * Having to deal with the full full stack is a lot more work | and needing manage IRL hardware is a PITA. I hated driving 50 | miles to swap some hard drives. Rather than using those nice | cloud APIs you are also on the other side implementing them. | And all the VM management software sucks in their own unique | ways. | | * Storage will make you lose sleep. Ceph is a wonder of the | technological world but it will also follow you in a dark | alleyway and ruin your sleep. | | * Building true redundancy is harder than you think it should | be. "What if your ceph cluster dies?" "What if your ESXi shits | the bed?" "What if Consul?" Setting things up so that you don't | accidentally have single points of failure is tedious work. | | * You have to constantly be looking at your horizons. We made a | stupid little doomsday clock web app that we put all the "in | the next x days/weeks/months we have to do x or we'll have an | outage." Because it will take more time than you think it | should to buy equipment. | theLiminator wrote: | It's great when you don't need instant elasticity and traffic | is very predictable. | | I think it's very useful for batch processing, especially | owning a GPU cluster could be great for ML startups. | | Hybrid cloud + bare metal is probably the way to go (though | that does incur the complexity of dealing with both, which is | also hard). | mschuster91 wrote: | > and most importantly doing this before spending half a | million/year on AWS | | AWS is... incentivizing scope creep, to put it mildly. In ye | olde days, you had your ESXi blades, and if you were lucky some | decent storage attached to it, and you gotta made do with what | you had - if you needed more resources, you'd have to go | through the entire usual corporate bullshit. Get quotes from at | least three comparable vendors, line up contract details, POs, | get approval from multiple levels... | | Now? Who cares if you spin up entire servers worth of instances | for feature branch environments, and look, isn't that new AI | chatbot something we could use... you get the idea. The reason | why cloud (not just AWS) is so popular in corporate hellscapes | is because it eliminates a lot of the busybody impeders. Shadow | IT as a Service. | tqi wrote: | Those busybodies are also there to keep rogue engineers from | burning money on useless features (like AI chat bots) that | only serve to bolster their promo packet... | mschuster91 wrote: | That depends on how the incentive structures for your | corporate purchase department are set up - and there's | really a _ton_ of variance there, with results ranging from | everyone being happy in the best case to frustrated | employees quitting in droves or the company getting burned | at employer rating portals. | tqi wrote: | > That depends on how the incentive structures for your | corporate purchase department are set up | | Sure, but that seems orthogonal to the pros and cons of | having more layers of oversight (busybodies, to use your | term) on infra spend. Badly run companies are badly run, | and I don't think having the increased flexibility that | comes from cloud providers changes that. | withinboredom wrote: | I literally started laughing at this. I worked at a bare- | metal shop fairly recently and a guy on my team used a | corporate credit card to set up an AWS account and create | an AI chatbot. | | The dude nearly got fired, but your comment hit the spot. | You made my night, thank you. | threeseed wrote: | > keep rogue engineers from burning money on useless | features (like AI chat bots) | | As someone who has worked on an AI chat bot I can assure | you it does not come from engineers. | | It's coming from the CFO who is salivating at the thought | of downsizing their customer support team. | ldargin wrote: | Bare-metal solutions save money, but are costly in terms of | development time and lost agility. Basically, they have much | more friction. | withinboredom wrote: | huh, say wut? | | I guess before Amazon invented "the cloud" there wasn't any | software companies... | threeseed wrote: | AWS isn't just IaaS they are PaaS. | | So it's a fact that for most use cases it will be | significantly easier to manage than bare metal. | | Because much of it is being managed for you e.g. object | store, databases etc. | withinboredom wrote: | Setting up k3s: 2 hours | | Setting up Garage for obj store: 1 hour. | | Setting up Longhorn for storage: .25 hour. | | Setting up db: 30 minutes. | | Setting up Cilium with a pool of ips to use as a lb: 45 | mins. | | All in: ~5 hours and I'm ready to deploy and spending 300 | bucks a month, just renting bare metal servers. | | AWS, for far less compute and same capabilities: | approximately 800-1000 bucks a month, and takes about 3 | hours -- we aren't even counting egress costs yet. | | So, for two extra hours on your initial setup, you can | save a ridiculous amount of money. Maintenance is | actually less work than AWS too. | | (source: I'm working on a youtube video) | threeseed wrote: | You should stick to making Youtube videos then. | | Because there is a world of difference between installing | some software and making it robust enough to support a | multi-million dollar business. I would be surprised if | you can setup and test a proper highly-available database | with automated backup in < 30 mins. | RhodesianHunter wrote: | Because the amount of time your engineers will spend | maintaining your little herd of pet servers, and the | opportunity cost of not being able to spin up _manged service | X_ to try an experiment, are not measurable. | yjftsjthsd-h wrote: | > maintaining your little herd of pet servers | | You know bare metal can be an automated fleet of cattle too, | right? | withinboredom wrote: | Have you ever heard of pxe boot? You should check out | Harvester made by Rancher (IIRC). Basically, manage bare | metal machines using standard k8s tooling. | dilyevsky wrote: | Cloud is putting spend decisions into individual EMs' or even | devs' hands. With baremetal one team ("infra" or whatever) will | own all compute and thus spend decisions need to be justified | by EMs which they usually dont like ;) | hipadev23 wrote: | They were paying on-demand ec2 prices and reserved instances | alone would save them ~35%, savings plan even more which would | apply to egress and storage costs too. Anyway, they're still | saving a lot more (~55%), but it's not nearly as egregious of a | difference. | RhodesianHunter wrote: | Right. Now what's the developer man-hour cost of the move? | | Unless their product is pretty static and not seeing much | development, they're probably in the negative. | manvillej wrote: | when we don't optimize for cloud and look at it from this | angle and squint, it looks like we're saving money! | zer00eyz wrote: | It's not a very good question: they still have aws | compatibility as their fail over/backup (should be live but | that's another matter...) | | What's capex vs opex now? Thats 150k of depreciable assets, | probably ones that will be available for use long after all | the current staff depart. | | Everyone forgets what WhatsApp did with few engineers and | less hardware, there's probably more than enough room for | them to grow, and they have space to increase capacity. | | The cloud has a place, but candidly so does a Datacenter and | ownership. | cypress66 wrote: | Going from AWS to something like hetzner would be most of the | way there probably. | hipadev23 wrote: | Hetzner in particular is a disaster waiting to happen, but | yes I agree with the sentiment. OVH doesn't arbitrarily shut | off your servers or close your account without warning. | rgrieselhuber wrote: | We've been on both Hetzner and OVH for years and have never | had this happen. | | The move does cost money, once. Then the savings over years | add up to a lot. We made this change more than 10 years ago | and it was one of the best decisions we ever made. | nonsens3 wrote: | Hetzner randomly shuts down one of my servers every 2-3 | months. | codenesium wrote: | Nice of them to test your failover for you. | rgrieselhuber wrote: | Yeah you do have to have redundancy built in, but we | don't get random shutdowns. | razemio wrote: | I am sorry what? I have been with Hetzner over 10 years | hosting multiple servers without issue. There has to my | knowledge never been a shutdown without notice on bare | metal servers and it does not happen often. Like once | every 2 years. | ptico wrote: | Hetzner suspended the account of a non-profit org I | voluntarily supported, without explaining the reason or | giving us possibility to take our data out. The issue was | resolved only after bringing it to the public space. Even | there they tried to pretend we are not actually their | customers first | riku_iki wrote: | I had the same issue, sent them ticket, they swapped | server, and worked fine since then. | dvfjsdhgfv wrote: | I've been using Hetzner for years and what happens every | 3-4 years is that a disk dies. So I inform them, they | usually replace it within an hour and I rebuild the | array, that's all. | | Recently I've been moving most projects to Hetzner Cloud, | it's a pleasure to work with and pleasantly inexpensive. | It's a pity they didn't start it 10 years earlier. | forty wrote: | I think OVH might let your server burn and the backup which | was stored next to it (of course) with it ;) | rgrieselhuber wrote: | Definitely don't keep all your data in one region only. | For object storage, I prefer Backblaze unless you need | high throughput. | FpUser wrote: | Using both Hetzner and OVH for years and not a single | problem be it technical or administrative. Does not men it | never happens but this is just my experience | dvfjsdhgfv wrote: | > Hetzner in particular is a disaster waiting to happen | | Why would you spread FUD? They have several datacenters in | different locations, and even if they were as incompetent | as OVH (they are not)[0], the destruction of one datacenter | doesn't mean you will lose data stored in the remaining | ones. | | [0] I bet OVH is also way smarter than they were before the | fire. | Karrot_Kream wrote: | After that 35% savings, they ended up saving about a US mid | level engineer's salary, sans benefits. Hope the time needed | for the migration was worth it. | alberth wrote: | I'm sure the are also getting better performance as well. | | Not sure how to factor that $ into the equation. | flkenosad wrote: | Also, I'd imagine most companies can fill unused compute | with long-running batch jobs so you're getting way more | bang for your buck. It's really egregious what these clouds | are charging. | darkwater wrote: | To get real savings with a complex enough project you will | need one or more FTE salaries just to stay on top of AWS | spending optimizations | baz00 wrote: | Plus... | | 2x FTEs to manage the AWS support tickets | | 3x FTE to understand the differences between the AWS | bundled products and open source stuff which you can't get | close enough to the config for so that you can actually use | it as intended. | | 3x Security folk to work out how to manage the tangle of | multiple accounts, networks, WAF and compliance overheads | | 3x FTEs to write HCL and YAML to support the cloud. | | 2x Solution architects to try and rebuild everything cloud | native and get stuck in some technicality inside step | functions for 9 months and achieve nothing. | | 1x extra manager to sit in meetings with AWS once a week | and bitch about the crap support, the broken OSS bundled | stuff and work out weird network issues. | | 1x cloud janitor to clean up all the dirt left around the | cluster burning cash. | | --- | | Footnote: Was this to free us or enslave us? | hotpotamus wrote: | > Footnote: Was this to free us or enslave us? | | I assume whichever provides more margin to Jeff Bezos. | gymbeaux wrote: | Our experience hasn't been THAT bad but we did waste a | lot of time in weekly meetings with AWS "solutions | architects" who knew next to nothing about AWS aside from | a shallow, salesman-like understanding. They make around | $150k too, by the way. I tried to apply to be one, but | AWS wants someone with more sales experience and they | don't really care about my AWS certs | baz00 wrote: | As an AWS Solution Architect (independent untethered to | Bezos) I resent that comment. I know slightly more than | next to nothing about AWS and I can Google something and | come up with something convincing and sell it to you in a | couple of minutes! | makeitdouble wrote: | Getting a bare metal stack has interesting side effects on | how they can plan future projects. | | One that's not immediately obvious is to keep on staff | experienced infra engineers that bring their expertise for | designing future projects. | | Another is the option to tackle project in ways that would be | to costly if they were still on AWS (e.g. ML training, stuff | with long and heavy CPU load). | flkenosad wrote: | Yep and hardware is only getting cheaper. Better to just | buy more drives/chips when you need them. | meowface wrote: | A possible middle-ground option is to use a cheaper cloud | provider like Digital Ocean. You don't need dedicated | infrastructure engineers and you still get a lot of the | same benefits as AWS, including some API compatibility | (Digital Ocean's S3-alike, and many others', support S3's | API). | | Perhaps there are some good reasons to not choose such a | provider once you reach a certain scale, but they now have | their own versions of a lot of different AWS services, and | they're more than sufficient for my own relatively small | scale. | gymbeaux wrote: | That's the niche DigitalOcean is trying to carve out. | I've always loved and preferred their UI/UX to that of | AWS or Azure. No experience with the CLI but I would | guess it's not any worse than AWS CLI. | efitz wrote: | I was thinking the same thing. If the migration took more | than one man-year then they lost money. | | Also what happens at hardware end-of-life? | | Also what happens if they encounter an explosive growth or | burst usage event? | | And did their current staffing include enough headcount to | maintain the physical machines or did they have to hire for | that? | | Etc etc. Cloud is not cheap but if you are honest about TCO | then the savings likely are WAY less than they imply in the | article. | flkenosad wrote: | > If the migration took more than one man-year then they | lost money. | | Your math is incorrect. The savings are per year. The job | gets done once. | | > Also what happens at hardware end-of-life? | | You buy more hardware. A drive should last a few years on | average at least. | | > Also what happens if they encounter an explosive growth | or burst usage event? | | Short term, clouds are always available to handle extra | compute. It's not a bad idea to use a cloud load-balancing | system anyway to handle spam or caching. | | But also, you can buy hardware from amazon and get it the | next day with Prime. | | > And did their current staffing include enough headcount | to maintain the physical machines or did they have to hire | for that? | | I'm sure any team capable of building complex software at | scale is capable of running a few servers on prem. I'm sure | there's more than a few programmers on most teams that have | homelabs they muck around with. | | > Etc etc. | | I'd love to hear more arguments. | awslol wrote: | They also saved the salaries of the team whose job was doing | nothing but chasing misplaced spaces in yaml configuration | files. Cloud infrastructure doesn't just appear out of thin | air. You have to hire people to describe what you want to do. | And with the complexity mess we're in today it's not at all | clear which takes more effort. | quickthrower2 wrote: | 100% this. Cloud is a hard slog too. A different slog | though. We spend a lot of time chasing Azure deprecations. | They are closing down a type of MySQL instance for example | for one which is more "modern" but from the end user point | of view it is still a MySQL server! | gymbeaux wrote: | Exactly. Last job I worked at there was always an issue | with the YAML... and as a "mere" software engineer, I had | to wait for offshore DevOps to fix, but that's another | issue. | icedchai wrote: | To manage a large fleet of physical servers, you need | similar ops skills. You're not going to configure all those | systems by hand, are you? | awslol wrote: | Depends on the size of the fleet. | | If you're using less than a dozen servers manual | configuration is simpler. Depending on what you're doing | that could mean serving a hundred million customers. | Which is plenty for most business. | dilyevsky wrote: | I broke the rules and read the article first: | | > In the context of AWS, the expenses associated with | employing AWS administrators often exceed those of Linux on- | premises server administrators. This represents an additional | cost-saving benefit when shifting to bare metal. With today's | servers being both efficient and reliable, the need for | "management" has significantly decreased. | | I also never seen an eng org where substantial part of it | didn't do useless projects that never amount to anything | rewmie wrote: | I get the point that they tried to make, but this | comparison between "AWS administrators" and "Linux on- | premises server administrators" is beyond apple-and-oranges | and is actually completely meaningless. | | A team does not use AWS because it provides compute. AWS, | even when using barebonea EC2 instances, actually means on- | demand provisioning of computational resources with the | help of infrastructure-as-code services. A random developer | logs into his AWS console, clicks a few buttons, and he's | already running a fully instrumented service with logging | and metrics a click away. He can click another button and | delete/shut down everything. He can click on a button again | and deploy the same application in multiple continents with | static files provided through a global CDN, deployed with a | dedicated pipeline. He clicks on another button again and | everything is shut down again. | | How do you pull that off with "Linux on-premises server | administrators"? You don't. | | At most, you can get your Linux server administrators to | manage their hardware with something like OpenStack, buy | they would be playing the role of the AWS engineers that | your "AWS administrators" don't even know exist. However, | anyone who works with AWS only works on the abstraction | layers above that which a "Linux on premises administrator" | works on. | dilyevsky wrote: | > A random developer logs into his AWS console, clicks a | few buttons, and he's already running a fully | instrumented service with logging and metrics a click | away... | | This only works that way for very small spend orgs that | haven't implemented soc 2 or the like. If that's what | you're doing then probably should stay away from | datacenter, sure | spamizbad wrote: | Going to be honest: If your AWS spend is well over 6 | figures and you're still click-ops-ing most things | you're: | | 1) not as reliable as you think you are 2) probably | wasting gobs of money somewhere | awslol wrote: | You just log into the server... | | Not everything is warehouse scale. You can serve tens of | millions of customers from a single machine. | baz00 wrote: | This is the voice of someone who has never actually ended | up with a big AWS estate. | | You don't click to start and stop. You start with someone | negotiating credits and reserved instance costs with AWS. | Then you have to keep up with spending commitments. | Sometimes clicking stop will cost you more than leaving | shit running. | | It gets to the point where $50k a month is | indistinguishable from the noise floor of spending. | ygjb wrote: | Yeah, that's part of it. The other part is that you can | move stuff that is working, and working well, into on- | prem (or colo) if it is designed well and portable. If | everything is running in containers, and orchestration is | already configured, and you aren't using AWS or cloud | provider specific features, portability is not super | painful (modulo the complexity of your app, and the | volume of data you need to migrate). Clearly this team | did the assessment, and the savings they achieved by | moving to on-prem was worthwhile. | | That doesn't preclude continuing to use AWS and other | cloud service as a click-ops driven platform for | experimentation, and requiring that anything that is | targeting production to refactored to run in the bare- | metal environment. At least two shops I worked at | previously have used that as a recurring model (one | focusing on AWS, the other on GCP) for stuff that was in | prototyping or development. | oxfordmale wrote: | That is having your cake and eat it. AWS administrators | don't do the same job as on prem administrators. | dheera wrote: | They probably need to now hire 24/7 security to watch the | bare metal if they're serious about it, so not sure about | that engineer | dilyevsky wrote: | onsite security is offered by the colo provider. You can | also pay for locked cabinets with cameras and anti- | tampering or even completely caged off depending on your | security requirements | baz00 wrote: | If we saved 35% that could hire 20 FTEs. | | Not that we'd need them as we wouldn't have to write as much | HCL. | throw555chip wrote: | Bandwidth would need to be compared and considered between EC2 | and what they were able to negotiate for bare metal co- | location. | bauruine wrote: | Bandwidth is about 2 orders of magnitude less on non-cloud | even without any negotiation or commitment. How much do you | have to commit for e.g. Cloudfront to pay 2 orders of | magnitude less than their list price of 0.02 per GB? | threeseed wrote: | Also being an uptime site surprised they didn't use the m7g | instance type. | | Would've saved another ~30% for minimal difference in | performance. | | For me this doesn't look like a sensible move especially since | with AWS EKS you have a managed, highly-available, multi-AZ | control plane. | hipadev23 wrote: | I'd be so excited to run my company's observability platform | on a single self-managed rack. | cj wrote: | > savings plan even more which would apply to egress and | storage | | Wait, is this accurate? | | If so I need to sign our company up for a savings plan... now. | We use RI's but I thought savings plan only applied to instance | cost and not bandwidth (and definitely not S3) | tommek4077 wrote: | You got it right. They do not include traffic or S3. | ActorNightly wrote: | Also, EKS (i.e a managed service) is also more expensive then | renting EC2s and doing everything yourself, which is not that | hard. | pojzon wrote: | Is 150$ really that much when you are paying hundreds of | thousends for nodes ? | threeseed wrote: | EKS's control plane consists of decent EC2 instances, ELB, | ENIs distributed across multiple availability zones. | | You're not saving anything doing it yourself. | | And you've just given yourself the massive inconvenience of | running a HA Kubernetes control plane. | aranelsurion wrote: | If that's the case, probably just going for Spot machines would | save them more than that move. | 4death4 wrote: | Cool, so you can hire one additional engineer. Are you sure your | bare metal setup will occupy less than a single engineer's time? | VBprogrammer wrote: | I'm not so sure it's a zero sum kind of thing. Yes, it seems | likely that they are paying at least one full time employee to | maintain their production environment. At the same time, AWS | isn't without it's complications, there are people who are | employed specifically to babysit it. | r2_pilot wrote: | Ha, while I'd work for that salary, even half of it would | almost double what I make as a sysadmin. I guess I work here | for the mission. Plus I don't have to deal with cloud services | except for when the external services go down. Our stuff keeps | running though. | 4death4 wrote: | I'm including overhead in that number. FWIW I know many ICs | earning over double that number, not even including overhead. | r2_pilot wrote: | >FWIW I know many ICs earning over double that number, not | even including overhead. | | Keep rubbing salt lol I live in a low cost area though. | It's even pleasant some times of the year. | dzikimarian wrote: | AWS was operated by holy ghost I assume? | byyll wrote: | They can fire the "AWS engineer". | jscheel wrote: | So, they essentially saved the cost of one good engineer. | Question is, are they spending 1 man-year of effort to maintain | this setup themselves? If not, they made the right choice. | Otherwise, it's not as clear cut. | jacquesm wrote: | That depends on where they are located. Good engineers aren't | 230K$ / year everywhere. | corobo wrote: | Americans get paid so much, got dayum. | | Half that and half it again and I'd still be looking at a | decent raise lmao | uoaei wrote: | That's about senior-level compensation even among most | companies in the Bay Area. Only the extreme outliers with | good performance on the stock market can be said to be | significantly higher in TC. | | Edit: even then, TC is tied to how the stock market is | doing, and not paid out by the company directly, so it only | makes sense to compare with base wage plus benefits. | dmoy wrote: | TC for senior at big tech companies is over $300k. Over | $400k for Facebook I hear. | | It doesn't take an extreme outlier to get significantly | above $250k. | | > so it only makes sense to compare with base wage plus | benefits. | | Not really, when stock approaches 40%+ of compensation, | and is in RSU with fairly fast vesting schedule. | jacquesm wrote: | Indeed. And it does of course needs to be offset to the | cost of living. | IntelMiner wrote: | It include an asterisk. Those salaries come with the | reality of living in locales like the bay area or Seattle | and the like generally, with all the exorbitant costs of | living in those areas | | A lot of companies (like Amazon) will gleefully slash your | salary if you try to move somewhere cheaper, because why | should we pay you more if you don't just need that money to | fork over to a landlord every month? | | There's also all the things Americans go without, like | socialized healthcare. Even with their lauded insurance | plans they still pay significantly more for worse health | outcomes than any other country | IshKebab wrote: | Nah even with bay area costs Americans get paid much much | more than elsewhere. I could easily double my salary by | moving from the UK to San Francisco. House prices are | maybe double too, but since they are only a part of your | outgoings, overall you come out waaaay ahead. | | Of course then I would have to send my kids to schools | with metal detectors and school shooting drills... It's | not all about the money. | the_gipsy wrote: | The employer es basically subsidizing the mortgage, | incentivizing workers to move to the most expensive | location, which make these locations even more expensive. | nxm wrote: | 90%+ of Americans have some form of health insurance, | especially tech workers. And there are issues with | socialized health care as well | acdanger wrote: | Likely including benefits in this figure. | cj wrote: | I was curious about this too, but this company lists a range | of $200-250k for remote. | | https://github.com/OneUptime/interview/blob/master/software-. | .. | | Side note: I'm in slight disbelief at how high that salary | range is compared to how minimal the job requirements are. | totallywrong wrote: | Right, like AWS is set and forget. | wholinator2 wrote: | Exactly. A more accurate figure would be the difference | between the work hours spent maintaining bare metal _minus_ | the work hours spent maintaining AWS. Impossible to know | without internals but at least a point in favor of bare metal | threeseed wrote: | Depending on what parts of AWS you use it is. | | Fargate, S3, Aurora etc. These are managed services and are | incredibly reliable. | | Lot of people here seem to think these cloud providers are | just a bunch of managed servers. It's far more than that. | SteveNuts wrote: | Even the "easy" services like that have at least _some_ | barrier to entry. IAM alone is a pretty big beast and I | doubt someone whose never used AWS would grasp it their | very first time logging into the web interface - and every | service uses it extensively. | | And then there's the question of whether you're going to | use Terraform, Ansible, CloudFormation, etc or click | through the GUI to manage things. | | My point is, nothing in AWS is 100% turnkey like a lot of | folks pretend it is. Most of the time, it's leadership that | thinks since AWS is "Cloud" that it's as simple as put in | your credit card and you're done. | threeseed wrote: | IAM and IaC is only needed once you get to a certain | size. | | For smaller projects you can absolutely get away with | just the UI. | isbvhodnvemrwvn wrote: | IAM is absolutely NOT something you can just ignore | unless you have a huge pile of cash to burn when your | shit gets compromised. | scns wrote: | There are companies earning money by showing other | companies how to reduce their AWS bill. | avgDev wrote: | Set and forget until you wake up to an astronomical bill one | morning. | politelemon wrote: | The beauty of decisions like these is that it looks good on a | bean counter's spreadsheet. The hours of human time they end up | spending on its maintenance simply don't appear in that | spreadsheet, but is gladly pushed onto everyone else's plates. | andrewstuart wrote: | This fiction remains that AWS requires no specialist expertise. | | And your own computers require expertise so expensive and | frightening that no sane company would host their own | computers. | | How Amazon created this alternate reality should be studied in | business schools for the next 50 years. Amazon made the IT | industry doubt its own technical capabilities so much that the | entire industry essential gave up on the idea that it can run | computer systems, and instead bought into the fabulously | complex and expensive _and technically challenging_ cloud | systems, whilst still believing they were doing the simplest | and cheapest thing. | kennydude wrote: | AWS does require some expertise to master considering the | sheet number of products and options. Tick the wrong box and | cost increases by 50% etc. | | Different solutions work best for different companies. | 1980phipsi wrote: | That's what he's saying. | FpUser wrote: | >"This fiction remains that AWS requires no specialist | expertise. And your own computers require expertise so | expensive and frightening that no sane company would host | their own computers." | | Each of these statements is utter BS | | PS. Oopsy I just read their third paragraph ;) | DirkH wrote: | Read their third paragraph. They completely agree with you | FpUser wrote: | LOL Sorry. I was shooting from the hip. Thanks. | madrox wrote: | Amazon didn't create it. I was there for the mass cloud | migrates of the last 15 years. It isn't that AWS requires no | specialist expertise, it's that it's a certain kind of | expertise that's easier to plan for and manage. Managing | physical premises, hardware upgrade costs, etc are all skills | your typical devops jockey doesn't need anymore. Unless | you're fine with hosting your company's servers under your | desk, it's the hidden costs of metal that makes businesses | move to cloud. | Nextgrid wrote: | Fortunately there are companies like Deft, OVH, Hetzner, | Equinix, etc that handle all of that for you for a flat fee | and while achieving economies of scale. | | Colocation is rarely worth it unless you have non-standard | requirements. If you just need a general-purpose machine, | any of the aforementioned providers will sort you out just | fine. | viraptor wrote: | This is a strawman that keeps getting brought up, but | nobody's claiming that. The difference remains though and the | scale depends on what exactly do you consider as an | alternative. Renting a couple servers will cost you in | availability/resilience and extra hardware management time. | Renting a managed rack will cost you in the above and premium | on management. Doing things yourself will cost you in extra | contracts / power / network planning, remote hands and time | to source your components. | | Almost everything that the AWS specialist needs to know comes | in after that and has some equivalent in bare metal world, so | those costs don't disappear either. | | In practice there are extra costs which may or may not make | sense in each case. And there are companies that don't | reassess their spending as well as they should. But there's | no alternative realty really. (As in, the usually discussed | complications of bare metal are not extremely overplayed) | killingtime74 wrote: | With opportunity cost its multiples more. We don't hire people | to break even on them right, we hire to make a profit. | roamerz wrote: | Even if they do spend 1 person-year of effort in maintenance | they still may have made the correct choice. Having a good | engineer on staff may have additional side benefits as well | especially if they could manage to hire locally and that | person's wages then contribute to the local economy. As you | said though it's definitely not clear cut especially from a | spectator's point of view. | matsemann wrote: | 1 man-year effort is probably less than the effort of AWS, | though. So a double win! | | A bit in jest, but places I've worked where we've moved to the | cloud ended up with more people managing k8s and building a | platform and tooling, than when we had a simple inhouse scp | upload to some servers. | timeon wrote: | Is this not addressed in section 'Server Admins' of the | article? | keremkacel wrote: | And now their blog won't load | mike_d wrote: | A lot of comments here seem to be along the lines of "you can | hire one more engineer," but given the current economic situation | remember that might be "keep one more engineer." Would you lay | off someone on your team to keep AWS? | | Keeping a few racks of servers happily humming along isn't the | massive undertaking that most people here seem to think it is. I | think lots of "cloud native" engineers are just intimidated by | having to learn lower levels to keep things running. | unglaublich wrote: | > I think lots of "cloud native" engineers are just intimidated | by having to learn lower levels to keep things running. | | Rightly so, because they're cloud native engineers, not system | administrators. They're intimidated by the things they don't | know. It'll be a very individual calculation whether it's worth | it for your enterprise to organize and maintain hardware | yourself, or isn't. | nwmcsween wrote: | eh, to a degree, having to deal with failed hardware and worse | buggy hardware is just a pain and really time consuming. | bee_rider wrote: | Especially given the low unemployment rate, laying somebody off | seems quite risky, if it doesn't work out you'll have trouble | hiring some replacement I guess. | cj wrote: | The current hiring market in tech is the easiest (for | employers) than it has been in a really long time. It used to | take 3-4 months to fill a role. In the current market it's | more like 2-4 weeks. | Jnr wrote: | And in other places around the world those would be closer to 3 | or 4 good engineers for the same money. And while each engineer | costs some money, they probably bring in close to double of | what they are being paid. | deepspace wrote: | > Keeping a few racks of servers happily humming along isn't | the massive undertaking that most people here seem to think it | is | | Keeping them humming along redundantly, with adequate power and | cooling, and protection against cooling- and power failures is | more of an undertaking, though. Now you are maintaining | generators, UPSs and multiple HVAC systems in addition to your | 'few racks of servers'. | | You also need to maintain full network redundancy (including | ingress/egress) and all the cost that entails. | | All the above hardware needs maintenance and replacement when | it becomes obsolete. | | Now you are good in one DC, but not protected against | tornadoes, fire and flood like you would be if you used AWS | with multiple availability zones. | | So, you have to build another DC far enough away, staff it, and | buy tons of replication software, plus several FTEs to manage | cross-site backups and deal with sync issues. | lol768 wrote: | Most of those requirements cease to exist if you decide to | colo. It's not cloud or "run your own DC". | 10000truths wrote: | You don't need to build your own datacenter unless your | workload requires a datacenter's worth of hardware. | Colocation is a feasible and popular option for handling all | of the hands-on stuff you mention. Ship the racks to a colo | center, they'll install them for you. Ship them replacement | peripherals, and the operators will hot-swap them for you. If | you need redundancy, that's just a matter of sending your | hardware to multiple places instead of one. Slightly more | involved, but it's hardly rocket science. | dankwizard wrote: | And these savings will be passed down to your customers too? | Or....? | willsmith72 wrote: | why should they be? | candiddevmike wrote: | Would anyone be interested in an immutable OS with built in | configuration management that works the same (as in the same | image) in the cloud and on premise (bare metal, PXE, or virtual)? | Basically using this image you could almost guarantee everything | runs the same way. | quillo_ wrote: | Yes - I would be interested :) my issue is that there is a | mixed workload of centralised cloud compute and physical | hardware in strange locations. I want something like Headscale | as a global mesh control plane and some mechanism for deploying | immutable flatcar images that hooks into IAM (for cloud) and | TPM (for BM) as a system auth mechanism. | candiddevmike wrote: | My email is in my profile if you want to discuss this | more...! | dilyevsky wrote: | This already exists - https://docs.fedoraproject.org/en- | US/fedora-coreos/bare-meta... | andrewstuart wrote: | The key point is _per year_ - ongoing saving every year. | alex_lav wrote: | Curious how much was spent on the migration? I skimmed but didn't | see that number. | | > Server Admins: When planning a transition to bare metal, many | believe that hiring server administrators is a necessity. While | their role is undeniably important, it's worth noting that a | substantial part of hardware maintenance is actually managed by | the colocation facility. In the context of AWS, the expenses | associated with employing AWS administrators often exceed those | of Linux on-premises server administrators. This represents an | additional cost-saving benefit when shifting to bare metal. With | today's servers being both efficient and reliable, the need for | "management" has significantly decreased. | | This feels like a "famous last words" moment. Next year there'll | be 400k in "emergency Server Admin hire" budget allocated. | nwmcsween wrote: | I would go with managed bare-metal, it's a step up from unmanaged | bare metal cost wise but saves you on headaches from memory, | storage, network, etc issues. | lgkk wrote: | I'm sure if the stack is simple enough it's non trivial for most | senior plus engineers to figure out the infrastructure. | | I've definitely seen a lot of over engineered solutions in the | chase of some ideals or promotions. | yieldcrv wrote: | I had an out of touch cofounder a few years back, he had asked me | why the coworking space's hours were the way they were, before | interjecting that companies were probably managing their servers | up there at the those later hours | | like, talk about decades removed! no, nobody has their servers in | the coworking space anymore sir. | | nice to see people attempting a holistic solution to hosting | though. with containerization redeploying anywhere on anything | shouldn't be hard. | maximusdrex wrote: | It feels like every comment on this article didn't read past the | first paragraph. Every comment I see is talking about how they | likely barely made any money on the transition once all costs are | factored in, but they explicitly stated a critical business | rationale behind the move that remains true regardless of how | much money it cost them to transition. Since they needed to | function even when AWS is down, it made sense for them to | transition even if it cost them more. This may increase the cost | of running their service (though probably not) but it could made | it more reliable, and therefore a better solution, making them | more down the line. | threeseed wrote: | > Since they needed to function even when AWS is down | | AWS as a whole has never been down. | | It's Cloud 101 to architect your platform to operate across | multiple availability zones (data centres). Not only to | insulate against data centre specific issues e.g. fire, power. | But also AWS backplane software update issues or cascading | faults. | | If you read what they did it's actually _worse_ than AWS | because their Kubernetes control plane isn 't highly-available. | wbsun wrote: | People often learn the lessons in a hard way: they will keep | saving 230k/yr until one day their non-HA bare-metal is down | and major customers retreat. | christophilus wrote: | > We have a ready to go backup cluster on AWS that can spin | up in under 10 minutes if something were to happen to our | co-location facility. | | Sounds like they already have their bases covered. | threeseed wrote: | Still need to synchronise data, update DNS records, wait | for TTLs to expire. | | HA architectures exist for a reason because that last | step is a massive headache. | quickthrower2 wrote: | They need to do fire drills and practice this maybe daily | or at least weekly? Failover being a normal case. Can't | you do failovers in DNS? | slig wrote: | >It's Cloud 101 to architect your platform to operate across | multiple availability zones (data centres) | | A huge multi billion dollar company with "cloud" in its name | recently had a big downtime because they did not follow | "cloud 101". | annexrichmond wrote: | Some AWS outages have affected all AZs in a given region, so | they aren't always all that isolated. For this reason many | orgs are investing in multi-cloud architectures (in addition | to multi region) | oxfordmale wrote: | You can use multiple availability zones and if needed even | multi cloud. If you own the hardware, you do regularly need to | test the UPS power supply to ensure there is a graceful fail | over in case of a power outage. Unless of course, you buy the | hardware already hosted in a data centre. | icedchai wrote: | I'm not convinced of the critical business rationale. Your | single data data center is much more likely to go down than a | multi-AZ AWS deployment. The correct business rationale would | be to go multi-cloud. | didip wrote: | The post is super light on details, it's hard to visualize if | it's worth it or not. | | For examples: | | - How much data are they working with? What's the traffic shape? | Using NFS makes me think that they don't have a lot of data. | | - What happened when their customers accidentally sent too much | events? Will they simply drop the payload? In bare-metal they | lose the ability to auto-scale quickly. | | - Are they using S3 or not, if they are, did they move that as | well to their own Ceph cluster? | | - What's the RDBMS setup? Are they running their own DB proxy | that can handle live switch-over and seamless upgrade? | | - What's the details on the bare metal setup? Is everything | redundant? How quickly can they add several racks in one go? | What's included as a service from their co-lo provider? | oxfordmale wrote: | It is not unlikely an AWS to GCP migration would have saved | them significant money too, in the sense that they likely | reviewed and right sizes different systems. | | I also would love to see a comparison done by a financial | planning analyst to ensure no cost centres are missed. On prem | is cheaper but only by 30 to 50%. That is the premium you pay | for flexibility, which you can partly mitigate by purchasing | reserved instance for multiple years. | threeseed wrote: | > On prem is cheaper but only by 30 to 50% | | Depending on use case. | | If you have traffic which isn't consistent 24/7 then AWS Spot | instances with Gravitron CPUs will be cheaper than on- | premise. | | Because you have the ability to in real-time scale your | infrastructure up/down. | Thristle wrote: | Its 2 seperate issues: | | fluctuation in traffic is handled by auto scaling | | Saving money on stateless (or short start times) services | is done with spots | mensetmanusman wrote: | nines of up time? | renecito wrote: | now a set of linux machines is considered bare-metal? | | I was under the impression that bare-metal means "no OS". | jdoss wrote: | The term bare-metal in this context means installing and | managing Linux (or your OS of choice) on hardware directly | without a Hypervisor is considered a bare-metal deployment. | It's never meant no operating system. | _joel wrote: | In this context it's running without the use of virtualisation | 0xbadcafebee wrote: | Moving from buying Ferraris to Toyota Camrys would save a lot of | money too. These stories are always bs blog spam by companies | trying to pretend they pulled off some amazing new hack. In | reality they were burning cash because they hadn't the faintest | idea how to control their spend. When we were | utilizing AWS, our setup consisted of a 28-node managed | Kubernetes cluster. Each of these nodes was an m7a EC2 | instance. With block storage and network fees included, our | monthly bills amounted to $38,000+ | | The hell were you doing with 28 nodes to run an uptime tracking | app? Did you try just running it on like, 3 nodes, without K8s? | When compared to our previous AWS costs, we're saving over | $230,000 roughly per year if you amortize the cap-ex costs | of the server over 5 years. | | Compared to a 5-year AWS savings plan? Probably not. | | On top of this, they somehow advertise using K8s as a | simplification? Let's reign in our spend, not only by abandoning | the convenience of VMs and having to do more maintenance, but | let's require customers use a minimum of 3 nodes and a dozen | services to run a dinky _uptime tracking app_. | | This meme must be repeating itself due to ignorance. The | CIOs/CTOs have no clue how to control spend in the cloud, so they | rake up huge bills and ignore it "because we're trying to grow | quickly!" Then maybe they hire someone who knows the Cloud, but | they tell them to ignore the cost too. Finally they run out of | cash because they weren't watching the billing, so they do the | only thing they are technically competent enough to do: set up | some computers and install Linux, and write off the cost as cap- | ex. Finally they write a blog post in order to try to gain | political cover for why they burned through several headcount | worth of funding on nothing. | Nextgrid wrote: | > The hell were you doing with 28 nodes to run an uptime | tracking app? | | To be fair, considering the pocket-calculator-grade performance | you get from AWS (along with terrible IO performance compared | to direct-attach NVME) I can totally understand they'd need 28 | nodes to run something that would run on a handful of real, | uncontended bare-metal hosts. | w4f7z wrote: | ServeTheHome has also written[0] a bit about the relative costs | of AWS vs. colocation. They compare various reserved instance | scenarios and include the labor of colocation. TL;DR: it's still | far cheaper to colo. | | [0] https://www.servethehome.com/falling-from-the- | sky-2020-self-... | boiler_up800 wrote: | I'd say $500k per year on AWS is kind of within a dead man's zone | where if you're not expecting that spend to grow significantly | and your infra is relatively simple, migrating off may actually | make sense. | | On the other hand maintaining $100K a year of spend on AWS is | unlikely to be worth the effort of optimizing and maintaining | $1M+ on AWS probably means the usage patterns are such that the | cloud is cheaper and easier to maintain. | dvfjsdhgfv wrote: | In my experience amounts are meaningless, what counts is what | kinds of services you need most. In my current org we use all 3 | major public clouds + on-on prem services, carefully planning | what should go where and why. | m3kw9 wrote: | Actually saves less if you spread the development of transitions | and op costs over 5 years. Hidden costs | monlockandkey wrote: | I've said this before, unless you are using specific AWS | services, I think it is a fools errand to use it. | | Compute, storage, database, networking. You would be better off | using Digital Ocean, Linode Vultr etc. so much cheaper than AWS, | lots of bandwidth included rather than the extortionate $0.08 GB | egress. | | Compute is the same story. 2 VCPU, 4GB VPS is ~$24 using a VPS. | The equivalent instances (after navigating the obscured pricing | and naming scheme), is the c6g.large is double the price at $50. | | This is the happy middle ground between bare metal and AWS. | greyface- wrote: | Weren't you searching for a colo provider just yesterday? That | was a quick $230k! https://news.ycombinator.com/item?id=38275614 | dilyevsky wrote: | Anyone can comment on server lifetime of 5 years? I would think | it's one the order of 8-10 years these days? | Havoc wrote: | Cheaper price, lower redundancy: | | >single rack configuration at our co-location partner | | I've got symetrical gigabit static ipv4 at home...so can murder | commercial offerings out there on bang/buck for many things. | Right up until you factor in reliability and redundancy. | FuriouslyAdrift wrote: | The math I have always seen is cloud is around 2.5x more | expensive than on-prem UNLESS you can completely re-architect | your infra to be cloud native. | | Lift and shift is brutal and doesn't make a lot of sense. | dvfjsdhgfv wrote: | > The math I have always seen is cloud is around 2.5x more | expensive than on-prem UNLESS you can completely re-architect | your infra to be cloud native. | | And and this point you are completely locked in. | dstainer wrote: | In my opinion the story here is that AWS allowed them to quickly | build and prove out a business idea. This in turn afforded them | the luxury to make this kind of switch. Cloud did it's job. | allenrb wrote: | Cue the inevitable litany of reasons why it is wrong to move out | of "the cloud" in 3... 2... 1... | tonymet wrote: | what is the market distortion that allows AWS margins to remain | so high? there are two major competitors (Azure, GCP) and dozens | of minor ones. | | It seems crazy to me that the two options are AWS vs bare metal | to save that much money. Why not a moderate solution? ___________________________________________________________________ (page generated 2023-11-16 23:00 UTC)