[HN Gopher] Burnt $72k testing Firebase and Cloud Run and almost... ___________________________________________________________________ Burnt $72k testing Firebase and Cloud Run and almost went bankrupt Author : bharatsb Score : 174 points Date : 2020-12-10 10:56 UTC (12 hours ago) (HTM) web link (blog.tomilkieway.com) (TXT) w3m dump (blog.tomilkieway.com) | steren wrote: | (Cloud Run PM here) I am sorry for the experience described in | the blog post, we could definitely be better at bill management. | I am glad that it worked out in the end and the customer was not | required to pay for the bill. | | Based on this experience, we decided to lower the default value | of "max instances" to 100 for future deployments. We believe 100 | is a better trade off between allowing customers to scale out and | preventing big billing surprises. Of course, customers can always | decrease it or increase it up to 1,000, or even above with a | simple quota increase request. | bharatsb wrote: | Part 2: https://blog.tomilkieway.com/72k-2/ | MaxBarraclough wrote: | > Google let go of our bill as a one time gesture! | | We've seen this happen with similar stories on AWS. Neither | platform supports prepayment with a hard limit on costs, and | this seems unlikely to change. | teekert wrote: | Yeah a friend of mine wanted a real cert, not letsencrypt (I | don't understand how that is more real but ok), as a bit of a | noob he clicked around on the AWS website and some days later | had a bill op 1500 eur. They also nulled it. Still, this | scares the hell out of me. | WrtCdEvrydy wrote: | AWS Certificate Manager gives you free non-extended SSL to | your machine, it's pretty nifty. | coddle-hark wrote: | I can sympathise with some of these stories, like the ones | where an overnight DDOS attack racks up a huge unexpected | bill, but this one in particular is just a story of gross | incompetence and negligence. The guy hacked together some | code in a few days and deployed it to a service with | unlimited billing without any kind of sanity checks and | without even understanding what he was paying for. He's an | ex-Googler, it's not like he hasn't heard stories like this | before. And the takeaway? "Oops don't deploy buggy code" and | "I shouldn't have used the default settings". OK, sure, let | me know how that works out for you. | jjk166 wrote: | > I jumped out of the bed, logged into Google Cloud Billing, and | saw a bill for ~$5,000. Super stressed, and not sure what | happened, I clicked around, trying to figure out what was | happening. I also started thinking of what may have happened, and | how we could "possibly" pay the $5K bill. | | > The problem was, every minute the bill kept going up. | | > After 5 minutes, the bill read $15,000, in 20 mins, it said | $25,000. I wasn't sure where it would stop. Perhaps it won't | stop? | | > After two hours, it settled at a little short of $72,000. | | > By this time, my team and I were on a call, I was in a state of | complete shock and had absolutely no clue about what we would do | next. We disabled billing, closed all services. | | 1) Why wouldn't you shut off the service as soon as you saw the | $5000 bill? Really doesn't sound like a "hop on a call with the | team for a few hours" kind of decision. | | 2) Why was the person taking a nap the only person who could get | a usage limit alert? One of the great benefits of a team is that | you can have multiple eyes looking out for problems. Someone | could have raised a flag as soon as the first unexpected alert | came in. | | 3) If going over the free tier limit was your chief concern, why | not check the usage after a quick run before letting it go | overnight and unsupervised? | | That the problem could get this bad is a UX failure, but the | problem itself is easily seeable and avoidable. | yawnxyz wrote: | I think the article said the Firebase dashboard data was at | least 24 hours behind what they were getting billed | villgax wrote: | I'd happily use a platform that allowed for an option for | limiting billing on a daily/monthly/another metric. | raphaelj wrote: | Some platforms do that: - Heroku costs are pretty predicable, | and you can easily set a maximum scalability threshold to their | auto-scalable dynos, so that they will never cost you more than | a predefined amount of money; - BunnyCDN requires me to top-up | their prepaid account, so that I'll never spend more than what | I have on that account. | atian wrote: | You can ask for a good faith billing adjustment. GCE or AWS is | well aware that things happen, and collecting something is better | than collecting nothing. | ceejayoz wrote: | Part two of the article indicates they did that, and had the | bill nulled out. | pettycashstash2 wrote: | from part 2: | | "After going through our lengthy doc on this incident sharing | our side of the story, various consults, talks, and internal | discussions Google let go of our bill as a one time gesture!" | throwaway7281 wrote: | To put it into perspective: You give me $72K and I'll set you up | a 1PB replicated storage infra with a total of 100+ available CPU | cores and half a TB RAM. | | I saw people burning through cash in the cloud, which makes you | wonder weather money is any concern at all. | iooi wrote: | Because electricity is free. And internet is also free. And the | rooms to put the servers are also free. A/C is free. And backup | generators are free. And diesel is free. | throwaway7281 wrote: | I was stretching the point, but what amazes me is the amount | of stuff people want to do vs. the amount of equipment they | throw at the problem. | ludocode wrote: | You can rent a full 42U rack in a colocation center for | ~$1500/mo easily. They'll handle all of that stuff, including | redundant power and redundant internet. | | Of course self-hosting on real hardware is not quite as | simple or cheap as GP made it out to be. But everything in | your post can be solved with simple fixed pricing, which is | still the main point: there are no dangers of wildly variable | pricing or accidental massive bills as there are with cloud | hosting providers. | LeonM wrote: | You forget your own cost here. | | A full-time system administrator costs more than 72k a year. | rndgermandude wrote: | Learning/Administering AWS/GCP/Azure costs time and therefore | money too. Maybe less money, maybe more money than doing | things yourself, depending on what you're doing. But you | shouldn't disregard such costs. | | I've seen enough buddies spending enormous amounts of time | doing AWS devops on top of paying the AWS premium when they | could have gotten away easily with a less than a handful of | VPS (+ optionally $100/month worth of cloudflare as a CDN). | donmcronald wrote: | I wonder what a full time cloud engineer costs. IMO it's | trading a simple system for a complex system, so now the | maintainers cost even more than sysadmins used to. | walrus01 wrote: | Once an organization reaches a certain size it will need one, | who ideally should be a person that can wear the dual hats of | linux/bsd sysadmin and also network engineer. | | If the person is already on payroll doing a number of other | duties, the time/effort to set up such an environment as | described in the post could be as short as a couple of days | work. | nlitened wrote: | Yeah, but the absence of system administrator has just cost | these guys 72k for several hours. | LeonM wrote: | I'm just trying to explain that server cost is more than | just the hardware. | | In most cases cloud computing is actually still a very cost | effective solution to infrastructure. But with infinite | scalability also comes responsibility. | | In the case of the OP, had they had their own hardware they | would have noticed that had written bad code (it would have | crashed or become very slow at least), but the cloud just | scaled up and processed their code. | | I'm not trying to defend Google in this case. Billing 72k | when a 100 USD limit is set sounds like a scam. | gruturo wrote: | And for half the use cases you still need one. Unless you go | full SaaS (which may or may not be an option depending what | you are doing, and what's on offer in that field), you still | get stuff which needs to be administered, updated, patched, | etc. Maybe not the OS layer (or maybe even that), maybe not | the DB (but then it might cost more), but you're not getting | away from that. | | You only really start saving at some scale (get a small core | of cloud-literate admins, and now you can have them run | thousands of systems for effectively no incremental cost) | jcelerier wrote: | Aha what ? Where I live (Bordeaux, France) a quick glance | through the job offers for full-time sysadmin are between 25 | and 35kEUR / year | MeinBlutIstBlau wrote: | Jesus no wonder many IT professionals in europe want to | come here. | KptMarchewa wrote: | You can get 2x more in Poland. | klohto wrote: | sysadmin isn't infrastructure engineer nor SRE. The | salaries for latter are way above 74k/year | LeonM wrote: | You confuse cost with salary. | | Someone who makes 35k a year costs much more. Think about | office space, training, insurance, payroll costs etc. | jcelerier wrote: | 35k is the gross which encompasses a big part of that | except office space. | sofixa wrote: | No it's not. In France companies pay ~1.5-2x gross salary | in total, gross going to the employee ( some of it | getting deducted by the state), rest going for health | insurance, taxes, etc. | chefkoch wrote: | France seems to pay IT quite bad. In Germany nobody would | work for this in metro area. | zigzag312 wrote: | Why full-time? It's possible to outsource IT administration | to a local IT company and pay only for set-up and maintenance | that is needed. Way less than 72k a year for many use cases. | | Also, companies that employ bunch of developers can find a | developer that has IT administration expertise and allocate | some of his time to this. Still cheaper than 72k a year or | employing someone full-time, if IT requirements don't call | for full time job. | | It's not just cloud vs full-time. | mike_ivanov wrote: | Exactly. Or just buy a managed dedicated server - it's more | expensive, but still it's a fraction of the full time | sysadmin cost, and much cheaper than AWS. | raverbashing wrote: | Which is moot if you're just going to burn the 72k by | shooting yourself in the foot. | | And the way these cloud services go, the 72k was only | detected because it was an one-off event. Turn that into a | base-level inefficiency that costs that over a year and what | have you then. | walrus01 wrote: | only a half TB of RAM? somebody recently _gave_ me a free 4U | server with 256GB of RAM in it. for zero dollars. | | If you need a number or xen or kvm VMs with a lot of RAM | assigned to each one for testing something, you can fairly | easily set up an older Dell R910 (quad-socket system) with | 512GB of RAM for under $2000. | svrtknst wrote: | cloud providers are scam artist, dont @ me | rawgabbit wrote: | Meh. I see this all the time with developers who want to abstract | everything away and not worry about the impact their poorly | performing code is having on the infrastructure or on the | $$bottom line. Time out? No problem, we will just spin up more | instances. I have heard that so many times. Maybe your code is | just bad. | tlarkworthy wrote: | And that's why you set the 'autoscaling.knative.dev/maxScale' for | cloud run https://github.com/futurice/terraform- | examples/blob/2ccb2fa3... | YawningAngel wrote: | A sane implementation would default this to a low value | steren wrote: | Cloud Run PM here: I'm sorry for the bad experience the | customer shared in this article, we could certainly do better | with bill management. | | We pick 1,000 as a default value for "maxScale", this can be | considered high for some users, but low for users who expect | infinite scaling from the service and start with a load test | to evaluate it. | heavyset_go wrote: | Given that this is a common problem, and one that can | bankrupt individuals or their businesses, when is AWS going | to implement spending caps that are easy to set up for new | developers or business owners? | jontro wrote: | Still the really expensive thing here was the datastore | reads. Cpu time was only 10% of the bill | quesera wrote: | > We pick 1,000 as a default value for "maxScale", this can | be considered high for some users, but low for users who | expect infinite scaling from the service and start with a | load test to evaluate it. | | That seems absurd to me. | | I think it makes much more sense to put the onus on the | _sophisticated customer_ to increase their maxScale to an | unusual value. Users who "expect infinite scaling...and | start with a load test" are sophisticated users. | | E.g. set maxScale low, like 2 or 4. The sophisticated | customer would recognize their oversight quickly. Click- | click, fixed, restart test. | | Effectively 100% of less-sophisticated customers will not | need enormous scale on day 1. Customers with whom you do | not have an existing billing relationship in the 10s of | thousands of dollars per cycle will almost certainly not | want it. | | I'd consider that level of overspecification to be a strong | anti-pattern. | steren wrote: | My response at | https://news.ycombinator.com/item?id=25379846 | asciimike wrote: | Every time I see another post like this I always wonder how many | people would be willing to buy "cloud insurance" where a premium | would cover overages due to your mistakes in dev, outages, etc. | | I don't have an exact billing model worked out, and I assume the | insurance provider would mandate certain practices (e.g. setting | up billing alerts that go to them, allowing them to view/manage | infra), but curious if people here would be willing to pay for | such a thing. | | (My assumption is that most people who fall into this are too | small to be willing to pay a reasonable % of their infra spend or | change their infra practices to prevent this, but I'd be curious | if this is something CIOs of companies who are thinking of | "moving to the cloud but are leery due to cost concerns"). | MattyMc wrote: | I know it's not cool these days, but I strongly prefer (and | advise) fully-managed cloud services like Heroku. I can fix my | database size and scale/resources (dynos) easily. It's simple, | and controlled. | qayxc wrote: | Or just test on a good old VM, which can be had for just a few | cents per hour and doesn't even allow for storage or network | traffic going out of hand. | | The first mistake is to deploy tests on completely opaque | hyper-scalers. Pretty much any software infrastructure - from | (SQL/No-SQL/In-Memory/etc.-) databases to entire web-frameworks | can be found as ready-to-go VM images and containers these | days. | | Sure, it's a bit more work to find and setup, but in the end | you gain an understanding what system is actually doing, how it | might behave and the ability to deploy into any environment - | from local workstations to bare metal to (a fleet of) VMs to | high-level hyper-scaler services. | walrus01 wrote: | writing as somebody who runs a big collection of bare-metal | hypervisors for ISP infrastructure purposes... this post quite | honestly just makes me smirk. | | I have truly lost track of the numerous instances, and number of | people who would be better served by buying a $1200 | test/development 1U dual socket server with a few fast SSDs in | it, and putting it in colocation somewhere for a few hundred | dollars a month. The costs would be absolutely fixed and known. | | On a tight budget? You don't even need to go as far as $1200, I | see totally fine test/development environment suitable, Dell 1U | servers on eBay right now for under $500 with 128GB of RAM. | | Or that would be better off purchasing a fixed-configuration | virtual machine (typically running on xen or kvm underneath) that | has a certain specific amount of CPU, RAM and storage resources | allocated to it which cannot balloon. For a fixed bill per month | like $65 or $85. | | You want to deploy your weird app on some cloud platform? sure, | go for it, once you've got the possible scaling-up cost issues | and possible bugs worked out on your own platform. | MattyMc wrote: | > writing as somebody who runs a big collection of bare-metal | hypervisors for ISP infrastructure purposes. | | I run a cloud SaaS company (3 employees). If I had the skillset | that it sounds like you do, I might be inclined to hose on bare | metal. But I don't. I don't know what a 1U dual socket server | is. | | It would take me some time to build these skills, and to match | the agility that the cloud offers. I don't think it's worth my | time, and probably not the author's time, either. | walrus01 wrote: | absolutely an understandable concern. One way of abstracting | away the need to own or maintain physical servers while still | achieving a definitive fixed monthly cost is to do as this | other commenter has done, renting dedicated servers from a | company that specializes in such: | | https://news.ycombinator.com/item?id=25372912 | salmonlogs wrote: | And this comment makes you look incredibly naive and narrow | minded. | | Running some code on a CPU != running a startup. Great you can | buy a Dell server on eBay, or you can build a powerful desktop, | or rent a VM or get a droplet or scrape on lowendbox. These are | not a secret and there is a great reason no one does this other | than hobbyists and neckbeards. | | You do your testing and it works, then what? You have to | deliver scalable reliable systems in production that require | identity management, security, backups, resiliency, | reliability, various networking services and a million other | supporting services and all the systems that come with it. | Never-mind actually scaling the application, monitoring it and | all the tools, systems and processes needed to run reliable | systems in production. | | The eBay servers provide you exactly zero of that and you've | just wasted time setting up an environment that is a snowflake | and doesn't represent reality. Testing on the cloud on exactly | the same platform you would use for production has a lot of | benefits when you look at value as limited developer time | delivering value to customers and the business. | | Whilst the $1200 server on eBay might be cheap today, you are | entirely missing the hidden cost of lost time when your team of | developers costing $M/year are wasting on testing in an | environment that doesn't help them find and solve production | issues. You don't need many hours of wasted time or downtime to | lose all of your so called cost gains. | | Optimising for absolute minimum cost is a fools errand that | only slows down actually delivering production systems that | deliver value to your customers. | | Please spend some time thinking bigger about the opportunity | cost and value delivery of technology beyond the immediate | dollars and cents - it might surprise you. | dang wrote: | Please don't be a jerk on HN, especially in response to someone | else's misfortune, even if they brought it on themselves. Maybe | you don't need to treat these people better (though why not?) | but you owe the community better if you're posting here. If you | wouldn't mind reviewing the site guidelines and taking the | intended spirit to heart, we'd be grateful. Note these ones: " | _Be kind_ " and " _Please don 't sneer_" | | https://news.ycombinator.com/newsguidelines.html | | p.s. I skimmed through your recent commenting history and it | looks great--just the kind of thing we want here. Sharing some | of what you know is exactly what we want users to do. But | please don't be supercilious about it, as in this comment and | https://news.ycombinator.com/item?id=25372847. Ignorance | doesn't deserve humiliation, and that ingredient poisons the | ecosystem (and eventually starts a degrading spiral, e.g. | https://news.ycombinator.com/item?id=25373520). The rest is | good. | Alex3917 wrote: | > Maybe you don't need to treat these people better (though | why not?) | | IMHO the best argument for 'why not' would be that it's | generally unethical to deploy software without first taking | the time to read the manual and understand how your | dependencies work. In this case the system wasn't live and | the costs of this fuckup were solely externalized onto | Google, which is fine because it was in large part their | fault anyway. But when dealing with production deployments, | this same behavior often results in users having all their | private information leaked or deleted. | walrus01 wrote: | I think cautionary tale are important - but it's also | possible, as I likely did above, to come down on people too | harshly. Not everything has consequences as severe as a | therac-25. | walrus01 wrote: | Thanks for the feedback. I almost certainly shouldn't have | included the part about the smirk, and I can definitely see | how that could appear to be making fun of somebody else's | misfortune. And the rest of it could have been phrased in a | more diplomatic way. | | For what it's worth it wasn't intended personally at the | _person_ who almost incurred the $72k bill, but more at the | general concept of test /beta software gone rampant and out | of control in an environment where billing has no limits. I | think we've all tested some sort of software in development | environments that caused havoc - but up until very recently | it's been hard for that to immediately begin causing real | world financial consequences... | craftinator wrote: | Prick. | dang wrote: | Would you please stop posting unsubstantive comments to HN | and stop breaking the site guidelines? You've been doing it a | lot and we ban that sort of account. I don't want to ban you | because your good comments are good, but the bad comments are | like mercury: they build up in the system and poison things. | | The rules apply regardless of how bad or wrong another | comment is, or you feel it is. | | https://news.ycombinator.com/newsguidelines.html | skrebbel wrote: | > The costs would be absolutely fixed and known. | | Until the server breaks and you have to drive over in the | middle of the night and try to replace it but the only | available server _right now_ is a shitty one and oh shit only | half the backups work cause the onsite backups are fried too | etc etc etc. | | There's many good arguments against high-level BaaS such as | Firebase but I'm not sure that "colo is cheaper" is one. | walrus01 wrote: | fine, buy two identical ones. and set up proper backups. or | even make the backup a hot-spare. | | if it's a test/development system that's meant to possibly | break, you shouldn't be driving anywhere at 0300 in the | morning anyways. | wrkronmiller wrote: | That implies you are going to be running prod in the cloud. | Unless you're developing against purely synthetic data, the | data transfer costs are potentially astronomical. | daneel_w wrote: | It's anecdotal, but I'm convinced I'm not alone here: we've | had more Amazon-related failures/outages in 3 years with AWS | than we had in 4 years of colo before heading to the cloud | because of the exact fear you described. | | Even a cloud setup needs good management and contingency | planning, and in absence of such it can fail just as hard as | a colo setup. | walrus01 wrote: | https://www.bbc.com/news/technology-55087054 | | https://www.seattletimes.com/business/amazon/amazon-web- | serv... | z3t4 wrote: | Just because you use "the cloud" doesn't mean you don't need | backups. "the cloud" also have downtime and other failure. | When deploying to the cloud you have to factor in the cost | for moving to another provider if/when it will be needed. | senko wrote: | > I'm not sure that "colo is cheaper" is one. | | It absolutely is (cheaper, and a good argument). As an | example: we're in the process of switching from Digital Ocean | to Hetzner for a project, that will increase infrastructure | performance (roughly memory/cpu/storage) by 4x and decrease | costs by 4x. And no driving to the colo center is neccessary, | as it's their dedicated server, so their on-site engineers do | the hardware replacement. | | Also, if you are not okay with your site being down for a few | hours, you can always buy two, like you would with a sensible | cloud setup. It'd still come up way cheaper (+ get more perf | if you can do load balancing for your usage). | | Also, I don't look at it from "colo is cheaper" point of | view. To me, it's "I can have several times more performance | _and_ hire a full time sysadmin to worry about it, for the | same price ". | Axsuul wrote: | Can anyone recommend a similar provider but for North | America? | sofixa wrote: | Oh please, you can't be serious? | | First, that $1200 server costs fixed money upfront, and then | you pay per month for colo, and for internet, which usually | includes a fixed bandwith cap or limits, with bursts which you | pay for. So no, it's not fixed. | | Second, a server you have to maintain, harwdare and software- | wise, is much more complex, and takes much more time, than a | managed service. You want a database? Install it yourself, | maintain it yourself, backup it yourself, monitor it yourself. | And same with everything else. | | Third, there's zero redundancy in your "setup". If you want it | with the most basic redundancy, you triple the costs (second | server, extra networking equipment, etc.). | | Fourth, geo redundancy/distributedness? Please. Good luck if | you have someone far away who wants to visit your site. | | Fifth, let's say you need to scale. Like, you get 10 more users | today than you did yesterday, or you get featured on HN or | Reddit or local news or whatever. F. You're looking at months | and a lot of cash, upfront. | | "A big collection of bare-metal hypervisors" makes sense in | some cases, but don't pretend it doesn't come with a non- | negligible time spent maintaning it and requires significant | upfront capital and man hours to do the same you get easily on | a public cloud platform (databases, message brokers, object | storage, etc. etc . etc. etc. etc. etc.). | walrus01 wrote: | yes, I am serious, because as described in the original post | this was somebody's _test /prototype environment_. Which is | the ideal use case for a DIY scenario, until you're ready to | send things into production. | | I have seen people spend thousands of dollars on a cloud | hosting platform to develop and test something when it could | have been done equally well on a 4-year-old desktop PC | sitting on somebody's desk. If they had only thought to | bother installing the same (debian, centos, whatever) | environment + packages + custom configuration on it. | JimDabell wrote: | > this post quite honestly just makes me smirk. | | It seems pretty callous to laugh at somebody else's $72K | misfortune, especially as they took reasonable steps to set a | budget on the platform. | walrus01 wrote: | From my point of view after doing this for 20 years, it's | like seeing the past 12 years of the "put everything in the | cloud" era, of new different people repeating exactly the | same mistake over and over again. | | It's like if you lived near a public park with particularly | aggressive geese that return every year, and watched new | ignorant groups of people get chased by the geese every | spring. | | It's not callous - it's the perspective of the people who are | responsible for the hypervisors that run underneath the VMs | and services that cause some of these massive billing | outrages. | jedimastert wrote: | > It's like if you lived near a public park with | particularly aggressive geese that return every year, and | watched new ignorant groups of people get chased by the | geese every spring. | | You're not really helping your argument here. Particularly | if people have been attacked for over a decade and no one | has put up a "aggressive geese" sign | walrus01 wrote: | Quite literally in my specific area, the former is true, | and also it's a fact the city government has put up a | number of signs in the nesting area. Still happens. | scarygliders wrote: | I quite agree with everything you've said in this and your | other post. | | My development environment? : My own dual-booting | Windows/Linux PC with 32G RAM and a few TB of SSD. Not to | mention the Nvidia RTX graphics card for gaming... | | I either spin up a VM to test stuff, or spin up a Python | virtualenv. Postgresql also running on this machine. | Whatever's needed. Need to emulate Stuff Happening From | Different Servers? Why just spin up another few VM's - | assign them the minimum resources required to get them | doing what they need to do, set up your VM network etc. Any | decently specced desktop machine can do that, never mind a | noisy rack system - considering they're way better and | vastly more powerful then the PC's we had 2-5 years before | that, which themselves were vastly more powerful than the | ones before them, and so on... | | Result? Can develop at home to my heart's content, then | when it comes to deployment spin up a remote VM on e.g. | DigitalOcean and take it from there. | | At the end of the day, "sErVeRlEsS" (I just don't like that | term, for some reason it rubs me up the wrong way, perhaps | because of...) just means "running stuff on someone else's | kit" - the same as "tHe ClOuD", so if I'm going to be | developing some system & software, I'd rather be doing it | locally, setting up whatever's needed to get it running, | and once satisfied, deploying it. | | Like you, I see either the same people, or new people, | simply Not Learning From The Past. There are many good | reasons why things were done like they were - developing on | a system you own, for example, rather than spinning up all | sorts of Cloudy Things or "serverlessy things" right from | the start. | | Hardware is cheap - you don't need a supercomputer to run | the beginnings of your latest Supah Scalable System[tm], | you just develop and run it on a reasonably up to date box, | and, sure, when you get to the stage where you need more | space/bandwidth/whatever, that's the point where you deploy | to some Cloudy Thing or SeRvErLeSs Thing. | walrus01 wrote: | My personal home office development environment at the | moment, done on an ultra low budget, is a dell precision | t5600 mid tower workstation PC (dual xeon, e5-2630) that | I got for $350 with 64GB of RAM in it, upgraded it to | 128GB, and put a $150 Samsung SATA3 SSD in. It's small | and relatively quiet and sits under my desk tucked in a | back corner with just a power cable and a few ethernet | cables plugged into it. | | Maybe some time in the near future I'll add a 2TB HDD | that I have sitting around into it so that I can create | VMs that have a 'fast' boot/root disk, and also give them | some lvm partitions on a big slow disk. | | It's running debian stable amd64 and is set up as a xen | dom0 hypervisor, with 768MB of RAM assigned to the dom0 | and the rest available for VMs. | | The amount of capacity that's available there to create | random PV or HVM VMs with as much RAM as I could want, is | more than sufficient for my personal needs. If I need | anything bigger I'll make it a more formal process and | put it on a machine at work. | onion2k wrote: | _Dell 1U servers on eBay right now for under $500 with 128GB of | RAM_ | | In part 2 the author says "Had we chosen max-instances to be | "2", our costs would've been 500 times less. $72,000 bill | would've been: $144". In other words, that $500 server is | several times more expensive than it would have been if | Firebase and GCP had saner defaults. | jmull wrote: | That $144 would have been for a single two-day test. | | Anyway, getting caught up in specific remediations that could | have prevented this is beside the point. For development you | want a _safe_ testing environment because mistakes, gaps, | misunderstandings, bugs are a fundamental part of it. The | entire point of tests and testing environments is to discover | the problems you know exist but need to test to find. | walrus01 wrote: | Yes I would agree that having automatic-scaling set to | effectively infinite by default is not the best choice for | the end user who is paying. | | But for the cloud operator, when somebody's runaway | application results in a $15,000 bill that has to be paid, | sure... | | As to whether letting people's runaway things scale up | infinitely is an intentional choice, I couldn't say. | reddit_clone wrote: | For me the needle swings towards Malice, away from Mistake | on this one. At the very least callousness | | Add to the long list of disappointments at humanity: | | - Late fees were a big part of BlockBuster's business. | | - Police departments factor traffic fines into their | budget. | | - Thousand other dark patterns that are unethical but not | illegal. | donmcronald wrote: | The crazy part to me is using the cloud for _testing_. It's | crazy. I have a 5 year old dual CPU Xeon with 128GB of RAM and | a couple NVME disks that I've spent about $1000 CAD total to | build ($700 USD). Something in that range on Azure is about $1 | / hour if you reserve a year. ~$9000 per year. | | All the people running workloads that don't require the | redundancy given, like CI, blow my mind. The costs are | astronomical vs buying a cheap or used server. Sure, use the | cloud for you production builds, but why not augment it with | something that doesn't cost as much? | walrus01 wrote: | as a totally randomly chosen example, that I spent not more | than 20 seconds searching for, here's a system with 128GB of | RAM for way under $500. | | https://www.ebay.com/itm/DELL-R910-16SFF- | model-4x-Intel-X755... | | need to add your own storage (good quality SSDs, of course). | | and it assumes you have somewhere to put noisy things... | | I would estimate it's about a 500W electrical load, so figure | $40-50 additional electrical bill, if you're trying to | precisely account for all costs. | | you can totally set up a desktop workstation dual xeon for a | similar price as well. | tunesmith wrote: | Deeply dissatisfying to read. Ex-Googler uses connections to get | his (understandable!) cloud mistake refunded. | | Every time I read one of these stories, I get more and more | convinced I will just simply never use scalable cloud tech for my | side projects. I'm not going to risk my family's retirement | savings on the all-too-possible chance that a small deep- | implication error will cause runaway charges. | daneel_w wrote: | It's "fantastic" how Google by the end of the article still come | out as a good and friendly bunch... | BonoboIO wrote: | Ridiculous ... it's designed to charge you, upgrade you and | makes u spend as much money as possible. And 24 hours after it | happend they show it to you in the dashboard. | | Great. | | Just use dedicated servers for the start! It will hurt way less | and you can easily upgrade to the cloud later, IF necessary. | that_guy_iain wrote: | Well, when someone lets you off a 72k bill you generally think | nicely of them. But considering Google had no way of collecting | that and asking for it would have resulted in loss of other | business (they'll keep hosting there and keep paying them) and | this isn't like Google lost 72k or like 72k even matters to | them so it's just good PR, good business to get money on the | backend and faster. | | I have to wonder if even if they tried to get the money would | they legally been able to fight it. From my experience with | judges in Europe, they would have most likely looked at the | budget being ignore and then upgrading someone from a free to a | paid plan without consent and told Google it was their own | fault and the services weren't ordered or authorised. | spacemanmatt wrote: | Author is an ex-Googler so he knew exactly how to speak to the | system | bjarneh wrote: | I'm sure that was an important factor, but wouldn't this in | any case just be a bill to a startup with no money on hand? | It's hard to make companies (without money) economically | responsible for anything I guess, it even seems hard to make | companies with money responsible sometimes. | donmcronald wrote: | I can't set a limit like that on my CC, so the first charge | for $5k would have cleared meaning it would have run way | longer and racked up way more usage. I'd bet you my | computer I would have been out _at least_ the $5k that | cleared. | bjarneh wrote: | You're probably correct. It really makes you think twice | about setting up some of those cloud services without a | hard limit cutoff. | Symbiote wrote: | I normally receive only my "part" of my corporate credit | card statement, but earlier this year I was sent more of | it. | | That's when I found out the card has a credit limit of | over EUR50,000. | | Reading this, I wonder if we should contact the bank and | ask for another card with a lower limit, to use with | various cloud services. We are 99% on-premises, but have | about EUR200/month in various GCS/AWS usage. | kerng wrote: | This is really scary. It's so unpredictable what one actually has | to pay, especially for a small business moving to the cloud is | much more challenging then it should be. | | When creating resources it's really unclear what one might be | charged, then there are saving plans and pre-commitment options | and so forth. | | Might be a good startup idea, basically just sell cloud resources | via a simple, predictable payment model. | salmonlogs wrote: | As an ex-Googler working in a customer facing role in Cloud you | did very well to get a $72k bill written off! It's definitely | possible but requires a lot of approvals and pulling in a few | favours. I went through the process to write off a ~$50k bill for | one of my customers and it required action every day for 3 months | of my life. | | Whoever helped you inside Google will have gone to a LOT of | trouble, opened a bunch of tickets and attended many, many | meetings to make this happen. | cogman10 wrote: | I know there's no reason for Google or AWS to do this, but man | do I wish there was a way to put down a spending limit and | simply disable anything that goes over that limit. | | It's a little bit nuts that there are no guardrails to prevent | you from incurring such huge bills (especially as a solo | developer that might just be trying out their services). | brianwawok wrote: | There are guard rails in quotas. Like you can only spin up X | servers without opening a ticket to ask for more. | | Now, think some of these quotas can still lead to some pretty | crazy bills.. but that is the point of at least some of | them.... | fweespeech wrote: | Tbh, its lack is why I don't use Google or AWS for projects. | marcell wrote: | If this happens you can usually reach out to Google to see if | they will refund the charge. They don't really benefit from | making $72k of a solo developer's buggy code. I've done it once | and their team was very helpful and reversed the charge. | tony wrote: | Google eventually forgave the bill in Part 2: | https://blog.tomilkieway.com/72k-2/ | | > Google let go of our bill as a one time gesture! | | Thank goodness. | | And it looks like it had to do with not understanding the API / | system on the first order, IMO. | | This hit me hard a few months ago with CloudFront invalidations | on AWS. I check billing and the things at 30usd in a single day, | from a norm of <1usd per month, so it's showing a 13,000% | increase (this is for documentation of open source projects). I'm | writing their support and at the mercy so to speak - technically | I ran up the bill. I ended up paying up, but I secretly hoped I'd | get some AWS credits for the projects, heh | | Aside: Amazon has some nice features for rule-based alarms on | accounts so when you spend more than X dollars, you get an email. | ljm wrote: | I think I'll treat this as the latest in a several line of | warnings about not going all-in on all these Cloud services until | you seriously know what you're doing. | | So much of it is so unnecessary to begin with. You can do so much | with a cheap VCS or two without thinking about lambdas or cloud | functions or kubernetes or who knows what. But these days you'd | be forgiven for thinking it's dark magic. | | You're not going to run up a 5 digit bill in a day by starting up | on a few $10 VPSs. And you'll probably have an architecture that | fits in your head to boot. | | Also: The article title should really be "Saved 72k and avoided | bankruptcy by being an ex-Googler." | tunesmith wrote: | Just don't go all-in on them at all unless you're spending | someone else's money. | tedunangst wrote: | What about the bill for the sites this request cannon was pointed | at? | awinter-py wrote: | fwiw I've had cloud vendors be relatively willing to forgive | bills when something went wrong with SLAs, bad bugs, or their | internal dashes misrepresented usage. | | they know their systems aren't perfect, and if you velvet hammer | them long enough, they'll do the right thing. | Twirrim wrote: | Disclaimer I work for another cloud (not AWS), opinions are | entirely my own. I try to avoid posting in a negative fashion | about clouds, but holy crap this blog post... | | AWS has this principle of Customer Obsession that enters in to | lots of discussions, design decisions etc. "What is the customer | experience of $foo?". Along with asking the positive, you ask the | negative too, and explore the customer impact of shit going | wrong. What does the worst experience look like, what is the | impact for the customer, how might you mitigate that or make it | so you can at least make it up to customers quickly, if you | really can't avoid it. | | I find it hard to fathom Sudeep's attitude here. So much of this | article is ringing large alarm bells. These are not the things | I'd want to see from a cloud provider as a customer. | | Is this Stockholm Syndrome? Too much drinking of Kool-Aid as an | ex-googler? Unfamiliarity with how other cloud providers operate? | | (from part 1) > Automatic Upgrade of Firebase Account to Paid | Account | | This is what I mean when I say look at negative vs positive use | cases. I'm guessing some combination of customers having a lousy | experience running in to Free Tier limits, and staff spending too | long having to bump up accounts. So they implemented an automatic | upgrade (What, then, is the point of a free tier? No room to | experiment, no room to try it and see) | | This is precisely the sort of thing that customer obsession | principle is supposed to aid in. Automated upgrade certainly | solves the staffing time spent bumping up accounts, and it helps | customers that used to have to request limits being increased, | but it massively fails in the negative customer experience side | of the equation here. Someone, somewhere, should have asked the | question "What if the customer has made a mistake". | | Instead, make it easy and quick for anyone to click a button and | get their account changed from Free to Paid, without staff | engagement. Give customers easy agency to control their | experience. | | > Billing "Limits" don't exist. Budgets are at least a day late. | | That's insane. Clouds are about speed and dynamic scalability. | Mistakes can ramp up the bill an crazy amount in a short period | of time, as Sudeep found out. | | How is a 24 hour delay in billing sync and budget warnings even | remotely acceptable to them / Sudeep / customers? | | Sure it's probably fine for the 90% cases, but that's crippling | for the 10% and even if you decide you really don't give a crap | about your customers, you don't want the bad press that 10% will | likely give you. | | Picture what financial damage someone might do if they | compromised some of your credentials somehow? You screwed up, | credentials got leaked, and you won't necessarily know for a | _day_ that something has gone wrong, nor will your restrictions | kick in?! | | Billing is the single highest TPS service in any cloud, with | Identity often a close second (billing gets requests for every | transaction, _and_ internal requests related to ongoing charges). | You need to handle a high rate of requests, with low latency both | in request /response and processing data received. It's a hard | engineering problem, and cloud platforms try to get some of the | smartest engineers working on it. An organisation of Google's | caliber has more than enough smart engineers to be working on | these kinds of hard problems, even by temporary secondment. | | Quota / Limits in a fast changing cloud environment need to be | dynamic and responsive. | | >I knew how to put the case for Google team when they would come | back to work in 2 days. | | How is 2 days even remotely acceptable? Maybe it's just how it's | written, but it reads like this is just accepted as the way | things are. Why would you even have to carefully work to present | your case? | | Where are the 24x7 response people with the ability to forgive | bills? $72k is chump change for a cloud provider, and especially | for a company of the scale of Google. Give your support agents | the tools and authority they need to make reasonable decisions, | with some appropriate kind of oversight process, and stick in | feedback mechanisms so product managers know what problems | customers are having. | | It's not like that would actually have cost them $72k in direct | running costs either. That _should_ have been a near instant no- | brainer. Forgive, move on, and reap the benefits of good customer | good will. That good will will earn you way more profit than | forgiving it would have cost. You 're investing in their | continuing business. Sometimes those investments will fail, but | most of the time they'll succeed. | | >In our case, it differed by 86,585,365.85 %, or 86 million | percentage points. Even when the bill was notified to us, | Firebase Console dashboard still said 42,000 read+writes for the | month (below the daily limit). | | So it's just fake observability? What's the point? 24 hours delay | here is nuts, almost to the point of being useless. It can be | hard to calculate these figures out yourselves. A fast feedback | cycle is critical. As Sudeep here found out, 24 hours is a great | way to have zero clue what's going on until it's too late. Is | there really no other way to get this information more up-to- | date? | | Moving on to part 2: >I had a team of ~7 engineers/interns at | this time, and it would take Google about 10 days to get back to | us on this incident. | | Why is a 10 day response time from Google considered even | remotely acceptable for a cloud provider? Your entire platform is | down, you're working out ways around this situation, stressing | about potential bankruptcy, and it's just cool with you that it | took 10 days for them to make a business/life changing decision | over what amounts to chump change? | | These kinds of mistakes happen with clouds, AWS is famous for | waving these shock bills from mistakes and it never takes 10 days | to get it done. | | Billing should be the easiest and most obvious thing. If your | cloud provider is creating complicated billing structures, that's | a problem the cloud provider should be solving, not expecting | customers to unravel the mysteries. | | Companies being spun up to help people navigate your billing | should be an alarm call, not something to celebrate or for | customers to consider normal. | | > Fail fast, learn fast with Cloud is a bad idea | | It shouldn't be. With near immediate feedback you'd have known | straight away that shit was bad, and cut the experiment out | before it cost you an arm and a leg. | | > While creating a Cloud Run service, we chose default values in | the service. The max-instances is preset to 1000, and concurrency | set to 80 ...... Same goes with Cloud Run! With Concurrency == | 60, max_containers == 1000 and each Request taking 400ms, number | of requests Cloud Run can handle 9 million requests per minute! | | Why are the default values that high on a service? That seems | like you're asking customers to shoot themselves in the foot. | Where was the look at the negative customer experience side of | the equation? Make it easy for customers to do the right thing. | | Then the bit that really bugs the crap out of me: > Thank you | Google! | | He's thanking Google for having had an absolutely shitty | experience on their platform: 10 days of stress from needless | delays in forgiving a trivially small bill, dealings with | multiple lawyers, investigating bankruptcy, risk of missing | product launch date, working around the clock to dig themselves | out of hell... | thrower123 wrote: | I'm not sure I want to know how much Azure and AWS revenue comes | from people spinning up test VMs or a kubernetes cluster to work | through a training, and then forgetting to turn it off. | | I've spent thousands extra this year because people stood up 4 MB | SQL databases and let them default to charging by vCores instead | of DTUs. | beoberha wrote: | Much less than the amount from deals with strategic partners. | The long tail of $5 a month from forgotten VMs is likely orders | of magnitude less than the handshake deals you can publically | read about. | seanwilson wrote: | https://blog.tomilkieway.com/72k-2/ | | > To overcome the timeout limitation, I suggested using POST | requests (with URL as data) to send jobs to an instance, and use | multiple instances in parallel instead of using one instance | serially. Because each instance in Cloud Run would only be | scraping one page, it would never time out, process all pages in | parallel (scale), and also be highly optimized because Cloud Run | usage is accurate to milliseconds. | | > If you look closely, the flow is missing few important pieces. | | > Exponential Recursion without Break: The instances wouldn't | know when to break, as there was no break statement. | | > The POST requests could be of the same URLs. If there's a back | link to the previous page, the Cloud Run service will be stuck in | infinite recursion, but what's worst is, that this recursion is | multiplying exponentially (our max instances were set to 1000!) | | Did you not consider how to stop this blowing up before | implementing? Having one cloud function trigger another like this | with no way to control how many functions are running at the same | time with no simple and quickly met termination condition (with | uncapped billing) is playing with fire. It's not going to be | optimal either if most of the time each function is waiting for | the URL data to download. | | You need to be using something like a work queue, or just keep | life simple and keep it on a single server if you can. | greatgib wrote: | At the end of page 2, there is a good ass licking bullshit | sentence: | | << It's also a great company to collaborate with. The tools | provided by Google are very developer friendly, have a great | documentation (for the most part), and are consistently | expanding.>> | | He said that as an ex googler and as the beneficiary of a | gesture, but this contradict the full history he told us. If the | doc and tools were so great, why he felt into this situation? | sudcha wrote: | OP here. | | Maybe it comes across like that, but as somebody building a | product with very limited resources, Google's documentation is | one of the best so far. | | The situation was our fault too. I just went with a test and | fail fast attitude, just like with every things we do a dev | environment. | delduca wrote: | happened to me something very similar, I was using cloud run to | fetch some subreddit posts and ended with a "recursive" way, | because of that, billions of invocations was made... luckly, I | was at the front of the computer and stopped before, but the bill | was around $4000! I contacted the google support and explained | all, they "forgot" my debit because of the bug | qayxc wrote: | So basically no one's testing their code anymore and just | throws it into a paid service? | | Great that Google was so lenient and all, but I really don't | get the appeal of using a hyper-scaler when a VM with docker | support can be setup in literally seconds and on-demand pricing | of less than 10 cents/hour for most quick-and-dirty tasks. | | Am I missing something here? | pettycashstash2 wrote: | Thank you for sharing. I was actually thinking of using fire base | for my project. They make us so easy to sign up for free tier. | Awaiting to see what happens in part 2 | bharatsb wrote: | Part 2: https://blog.tomilkieway.com/72k-2/ | spacemanmatt wrote: | I recently put the (soft) kibosh on a project in my stable | trying to switch to FireBase at the last minute. | | It looks attractive but the business aspects are frankly | frightening, and I'm not even talking about the risk of a large | bill. Getting your metrics 24h late sounds like a deal killer | for me. So much for observability! | asciimike wrote: | Minor nit: many non-billing metrics are near-real time, e.g. | DB concurrents, cloud functions CPU/RAM usage; any metrics | that require aggregation (storage, billing) are going to be | batched less frequently. This is going to be true across all | platforms of non-trivial scale (eventual consistency + batch | jobs). | | Second note: the number of people who actually do this is | very low (a few a year, of hundreds of thousands of | developers). The blog posts are scary, but in my ~five years | at Firebase, I'm pretty sure we refunded every one. As my | boss (James Tamplin, CEO of Firebase) used to say, "There are | lots of bad systems, but rarely are there bad people." | dilatedmind wrote: | interesting, how did the spend breakdown between cloud run and | firebase? | | did you have any limit to how many req/s you made to an | individual site? It seems this would be difficult to implement | with this architecture. | | how did you deal with following links in circles/ avoiding | scraping the same page multiple times? | | I had built something similiar at a previous job, recursively | scraping ecommerce sites. The first thing I noticed was some of | the sites we were scaping couldn't handle more than a couple | requests a second (in particular as we scaped uncached pages by | sites running php). Other sites were quick to ip ban. | | I kept things simple, a few dozen micro instances on aws (think | they were like $3 a day) running puppeteer. A single server | acting as a controller, keeping a per site queue and allowing us | to set per site request limits if necessary. All the state of | which links were already seen just kept in memory. Of course | everything was also persisted to a db, and if the controller | process needed to be restarted, it could restore the queue/ seen | state and resume. | akh wrote: | > Had we chosen max-instances to be "2", our costs would've been | 500 times less. $72,000 bill would've been: $144. Had we chosen | concurrency of "1" request, we probably wouldn't have even | noticed the bill. | | > If you count the number of pages in GCP documentation, it's | probably more than pages in few novels. Understanding Pricing, | Usage, is not only time consuming, but requires a deep | understanding of how Cloud services work. No wonder there are | full time jobs for just this purpose! | | Great write-up - thanks for sharing @bharatsb! As you say, cloud | pricing has become too complex for developers to understand | quickly (they want to ship features, not calculate costs). Infra- | as-code is great, but it has made it even harder to understand | which code/config option costs what. `terraform apply` is like a | checkout screen without prices. | | We're trying to solve this problem with infracost.io, initially | looking at Terraform. It would be interesting to get your | feedback on whether such an approach might have helped you? | Probably not as it doesn't look like you were using Terraform? | lewich wrote: | So when in IT industry there will be responsibility for what we | engineer? | pwinnski wrote: | This is most developers' worst nightmare when it comes to a | completely new environment generally, and Cloud solutions | specifically. | | It's easy and pointless to say they should have done things | differently. Worse than pointless. Obviously they should have, | and kudos to them about being open about the compounded mistakes. | | Still, this strikes at the fears that lie in the heart of any | reasonable, honest developer doing something completely new. | | New developers should be cautious about cloud platforms, but they | were! Not cautious enough, obviously, but they did set limits | they thought would be honored. | | Platforms should have hard monetary limits at the account level, | clearly, as well as an option to turn them off. Shame on all of | them which don't. | FastQ wrote: | I'm just a student but I've spent about 10 hours trying to figure | out why Azure has been charging me >$5/day for their "basic" | database @5DTUs, 2gb max storage. This morning I was so | exasperated I sent a letter threatening to report them for fraud | if nobody could tell me why I was being charged 30x the listed | rate, which so far no one has. This is an extremely cathartic | post to see that I'm not alone, thanks for sharing. | [deleted] | marktolson wrote: | Go to billing > cost analysis > filter by resource break down. | Azure billing analysis is pretty amazing. | FastQ wrote: | Yeah, but it just shows my database cost which is higher than | is listed as far as I can tell. | luser007 wrote: | Could it be listed "hourly" and you're charged "daily"? Add | in VAT (equal to 25% in some countries) and you match the | 30 times higher than expected charge. | ylere wrote: | https://azure.microsoft.com/en-us/pricing/details/sql- | databa... | | Basic tier, 5 DTUs, 2 GB is listed as ~$4.8971/month or | $0.0068/hour on this page. Extra storage would cost more | but is not available for the basic tier. | sudcha wrote: | OP here. Just found out that the post made it to HN. Thanks for | sharing and I'll be replying to some comments. | altdatathrow wrote: | Do not use hosted cloud services where the implementation creates | publicly accessible API keys and each HTTP request results in a | charge to your account. A few specific examples are Firebase, | Algolia, and AWS Lambda. | | All it takes is one programming mistake or one bad actor and you | can find yourself in an equally precarious situation. | LordHeini wrote: | That sort of crap is the reason we host all our stuff on root | servers. | | Even trying to read the amazon pricing for their instances, hours | and what not, drives me insane. | | Seems this is done on purpose. no wonder they make so much money | with it. | | So i have never seen a reason to move any stuff to the cloud. | | Just grab a dedicated server for a few bucks and put a bunch of | docker containers on those. | | Its way cheaper, usually not more complicated. Just use a CI with | Gitlab runners or whatever and be done with it. | | Most apps don't need scaling anyway and if you do, just put that | app on bare metal fitting your requirements. | that_guy_iain wrote: | > That sort of crap is the reason we host all our stuff on root | servers. | | Having just started my own journey into building products for | myself, pretty much the first thing I realised with my tech was | I need to get dedicated servers instead of cloud just because | it costs 100x less. | | > Just grab a dedicated server for a few bucks and put a bunch | of docker containers on those. | | Exactly, if you really want kubernetes coolness to act cloud | like, install kubernetes it's free and is super easy to setup. | | And with the cost savings you can literally buy multiple spare | servers and with kubernetes using them all while keeping the | usage low allowing to scale up new nodes if needed. | throwaway201103 wrote: | > kubernetes ... is super easy to setup | | Can you point me to the super easy setup guide? Because I've | tried a few and never gotten it working. | WrtCdEvrydy wrote: | I don't like use kubernetes raw but I am a fan of Caprover | (which has kubernetes support) | scrollaway wrote: | AWS pricing is not obscure, it's just not for you. So in that | sense, you are correct to not see a reason to move to the | cloud, but your advice does not apply to everyone. | | And I don't believe they make "more money" that way at all. AWS | margins are either very low or very high, and the higher | margins and prices tend to be the "simpler" ones: packaged, | managed products such as Redshift that are billed on fewer | tiers and flatter prices. | | When you design your application with AWS, pricing has to enter | your design considerations. For example if you are designing | something that will interact a lot with S3 you want to minimize | PUTs. You want to minimize ram usage on lambda by streaming | rather than buffering. Etc. | | AWS is not a suitable product for playground stuff. The only | reason it gets used as such is because it's easier if you're | already using AWS for other things (or it's you're already very | familiar with it). | nojito wrote: | AWS's margin is currently 30+% which is massive. | | >AWS pricing is not obscure | | There is a massive secondary consulting market because of | AWS's price obscurities. | dragonwriter wrote: | > There is a massive secondary consulting market because of | AWS's price obscurities. | | There is a massive secondary consulting market because the | enterprise market is addicted to secondary consulting. This | secondary consulting market _includes_ AWS pricing because | it includes pretty much any IT service the target market | might be interested in. | | A rational need for decomplexification _isn't_ necessary to | explain the existence or coverage of enterprise secondary | consulting, IT or otherwise. | klohto wrote: | > There is a massive secondary consulting market because of | AWS's price obscurities. | | While that's true, there is consulting market for most | things that are complicated. Doesn't mean they are shady. | It's simply not for you. You are welcome either to dive in | or get a consultant. I promise you though, that AWS pricing | isn't difficult once you understand few concepts and know | your way around the Cost Explorer. With proper tagging, | it's easy to drill down which resource is consuming how | much. I don't believe there is a way to have a simple | billing for a complicated product(s). | Closi wrote: | > While that's true, there is consulting market for most | things that are complicated. Doesn't mean they are shady. | | It does mean it's not simple though. | scrollaway wrote: | Obscure and complex are different concepts. I'm part of | that "secondary consulting market" FWIW, so I'd like to | think I know a thing or two about it. | | Does AWS have high-margin prices? In aggregate, somewhat, | but this is mostly driven by the big ticket managed | enterprise items: Aurora, Redshift, Quicksight, probably | Fargate, etc. A lot of their more popular stuff (S3, | Lambda, ...) offer incredible value for very little | money. EC2 is the exception I believe, because I | understand it to be high margin for how popular it is. | But EC2 pricing is one of their simplest ones. | | Could AWS simplify some of their pricing? Yes, probably. | There's always room for optimization. Personally for | example I'd like to see their pricing be global rather | than different by region (with understandable exceptions | for govcloud and china). | | Is AWS making its pricing complicated for nefarious | purposes? No, there is no evidence to support that. | | AWS pricing absolutely is not simple. It's a part of the | AWS stack. You need to study AWS's events/signals system | to be able to write apps that make the best use of AWS's | interconnected stack. You need to study their APIs / SDKs | to really understand what you're able to implement. And | you need to study their billing systems to understand how | to implement apps that run cheaply, and be able to | predict potential runaway costs. | | It has to be a part of the design. That's why you may | want to hire consultants for it: People who understand it | better than you do, and will be able to assist you in | reducing your costs. | | It's just another kind of optimization. Maybe some | software engineers don't like it because it hits them | where it hurts (the wallet) when they don't do it right, | rather than be able to brush it off as they usually do. | | It's much easier to ignore the waste produced by, say for | example, the 3000 javascript dependencies shipped with | the fat, unoptimized electron app they ship on their | users' desktops, that do a ton of unnecessary expensive | computing; when all that crap is client-side and it's the | _downstream user_ 's electricity bills and CPU time | that's being used. | [deleted] | scrollaway wrote: | The margin is absolutely not the same across all products. | | > _There is a massive secondary consulting market because | of AWS 's price obscurities._ | | Its. Not. For. You. | | AWS pricing is a part of your design. With some exceptions | (that you aren't talking about), they charge you more for | using more resources. You are forced to design systems that | use less resources if you want to optimize your bill. | | That consulting market is an optimization market. It's | economics at its best. | | If you are too small to have to take these things into | account regardless, AWS is not for you. You're welcome to | use it, but don't be surprised if you end up having to deal | with these kinds of things which simply don't exist in the | world of flat-price underprovisioned droplets. | [deleted] | nojito wrote: | >AWS pricing is a part of your design. With some | exceptions (that you aren't talking about), they charge | you more for using more resources. You are forced to | design systems that use less resources if you want to | optimize your bill. | | This is marketing. | | It's like saying you want to build a house and the quote | you got ends up blowing up 100x overnight. | | Great example is the 100k credit for startups. You can | repeat it's not for you all you want, but their business | is predicated on pricing ignorance and vendor lockin. | scrollaway wrote: | The $100K credit (which I've been granted multiple times) | is there because if Amazon can get you to invest serious | work into their infra, they'll make up for it in the long | run. It's not "lock in", it's sales. The only amazon | "lock in" really is their bandwidth-out pricing, which is | a sleazy tactic for sure but I'm not hesitant to call it | out _when it 's the case_. | | You can get the $100/$300/$1000 tier if you are in "just | checking it out" solo mode. $5k and up requires either | connections, partnerships, or a serious application. | | Anyway I don't know what your point is, I'm not even sure | if you have one. They're not "marketing" their pricing, | nor the fact that you are "forced to design systems that | use less resources". | csharptwdec19 wrote: | > Anyway I don't know what your point is, I'm not even | sure if you have one. They're not "marketing" their | pricing, nor the fact that you are "forced to design | systems that use less resources". | | I think they are referring to this statement: | | > > AWS pricing is a part of your design. With some | exceptions (that you aren't talking about), they charge | you more for using more resources. You are forced to | design systems that use less resources if you want to | optimize your bill. | | It is a defense that I've heard in many AWS talks in the | past. | | Where it turns into a 'marketing' blurb to me is my real | world experience in these AWS talks in the places I work. | As a real world example, we had a product that required | -some- architectural work, but otherwise was solid, and | could run on 3 live EC2 instances (2 web LB, 1 live | backend) and 1 spare (spare backend) | | The Consultant that AWS partnered us with? Suggested a | very overdone architectural revamp, moving everything | possible into AWS Specific technologies. | | It's marketing in that in many of our experiences, we | know there is often at least one person on a team who | does -not- have the discipline and/or experience to | -keep- a system using less resources as the field goes | from green to brown. | scrollaway wrote: | Overengineering is easy and happens not just with AWS but | with just about anything in software engineering. | | I'm having trouble seeing how this changes what I'm | saying: That with the way AWS pricing is structured, you | are supposed to take it into account when designing your | product. | | When you reach a certain size / complexity and you have | to design infrastructure, you _should_ be making | schematics, predictions on the usage peaks and troughs, | how various parts of the infra will be affected, how | active /idle they will be. | | When you are dealing with AWS, pricing becomes extremely | predictable because it can be derived from those plans. | And it is far better to be dealing with that kind of | model than to deal with "unlimited with a million | asterisks" or something. AWS is predictable, reliable, | and most notoriously has never ever increased their | prices, so whatever you calculated will not go up because | of Amazon's decisions. | WrtCdEvrydy wrote: | > Suggested a very overdone architectural revamp, moving | everything possible into AWS Specific technologies. | | To be honest, depending on the technology, the savings | could be worth it... for example, did you know you get a | discount if your traffic is served over cloudfrount? even | if your distribution is set to no cache any resource, you | can front your APIs using cloudfront and save networking. | akh wrote: | How do you take pricing into your design considerations? Does | it come with experience from using an AWS service in | production and understanding how it's priced, combined with | the usage numbers the new system might get? I'm trying to | learn more about how engineers currently do this. | thanksforthe42 wrote: | Calling programmers "Engineers" is a misnomer. | | I wish programmers had the prestige it deserved for | combining Science, tradition, authority, and art. | | Engineers are not allowed to use tradition, authority or | art. They are restricted to being modern day calculators. | | Nothing is wrong with either. | csharptwdec19 wrote: | The shift from 'Developer/Programmer' to engineer has | indeed been part of a push away from creativity towards | cookie-cutter work. | | An interesting analogue would be the Automotive industry; | As time progressed, Companies focused more and more on | 'engineering' versus art/tradition/etc. But as the | industry evolved, "Flashy" vehicles that took risks | became moreso either a halo product for a brand, or | relegated to Luxury/Boutique. | | And, of course, there was the dark side of this shift; A | good example from the 70s, the level of 'engineering' | driving the design of the vehicle and it's assembly | didn't take into consideration the actual line worker; in | Ohio the workers wound up getting overworked, burned out, | and in some cases actively sabotaged the product, because | they were being treated like automated machines. | thanksforthe42 wrote: | I think that missed the point. | | Engineers are applied scientists. | | Programmers are not applied scientists. | scrollaway wrote: | Why does any of this matter? | scrollaway wrote: | Basically, yes. | | It's not that complicated, it's just not something | engineers are usually used to do. If you use an AWS | service, you look at its pricing. | | Take s3 for example: whenever you use it, you'll pay for | outgoing bandwidth, PUTs, GETs, and storage. | | So you seek to minimize all of these: | | 1. Bandwidth: use cache layers. This also minimizes GETs. | | 2. PUTs: design your app in a way that doesn't do | unnecessary inserts into s3. Consider alternatives such as | redis, postgres or filesystem depending on the need. | | 3. Storage: compress your objects if they compress well. If | they aren't often accessed, use storage classes and auto | lifecycle management. | | Pricing in AWS generally reflects some kind of engineering | limitations you will face at scale in the first place, so | it makes sense to go through this whole exercise either | way. | croh wrote: | > Most apps don't need scaling anyway and if you do, just put | that app on bare metal fitting your requirements. | | Most important ! | vmception wrote: | > Most apps don't need scaling anyway and if you do | | Man, exactly right. Many of guys here would love crypto once | you stop asking why and start asking how. | | The most lucrative projects these days are completely frontend | UI, they don't even their own backends they as just read state | from the nearest node when the client connects their wallet. | | Some people forgot that the scalability game was to convert | traffic into money. So ditch that, and remember you are in the | money game. | Aperocky wrote: | > Most apps don't need scaling anyway | | This is exactly right. I host stuff in buckets/cloudfront and | uses a bit of lambda/route53. I end up paying $4 a month. | | now that will be very different if 10 million people suddenly | decide to visit my site, but if that happens money probably | won't be a problem after all. | onion2k wrote: | The fact that cloud providers don't have a simple "This is how | much I can afford, don't ever bill me more than that!" box on | their platforms makes development a lot scarier than it really | needs to be. | sandGorgon wrote: | Google does have this feature | https://cloud.google.com/billing/docs/how-to/budgets-program... | | Here's the specific example | https://cloud.google.com/billing/docs/how-to/notify#cap_disa... | ggthrowaway2020 wrote: | As a former victim to the same issue as OP, I am furious | every time I see a Googler promote that as a solution. | | In our case, we racked up a $10000 bill on BigQuery in ~6 | hours, when a job was failing and auto-retrying. | | We had set up _every_ alert correctly and our reaction time | was about 5 minutes (about $100 of usage, no big deal). So | how did we get a $5000 bill? _Google 's_ alert was 6 hours | late (according to them, this was root-caused to us, because | we were submitting jobs continuously). They pointed to their | TOS and said they don't guarantee on-time delivery of the | alert. | | I had to write up a blog post with fancy graphs and prepare | it for social media before they finally agreed to eat the | bill. | FeistySkink wrote: | Is there a public postmortem anywhere? Your message points | to 'no', but just in case. | mrtksn wrote: | OP claims that the budgets are not real time, they are | eventually accurate but if it happens that you spend too fast | you may end up with a larger than your budget sum before | anything triggers. | lights0123 wrote: | > There is a delay of up to a few days between incurring | costs and receiving budget notifications. Due to usage | latency from the time that a resource is used to the time | that the activity is billed, you might incur additional costs | for usage that hasn't arrived at the time that all services | are stopped. Following the steps in this capping example is | not a guarantee that you will not spend more than your | budget. | | This looks like it has the same problems as the post, because | it also relies on those budget alerts that can happen a long | while after you've exceeded them. | [deleted] | modeless wrote: | "Following the steps in this capping example is not a | guarantee that you will not spend more than your budget." | | "Resources [...] might be irretrievably deleted." | | Also it's not automatic, you have to manually write code to | do it, and test it, and make sure not to break it. | | A reasonable implementation of this feature would be built | into the console, guarantee a maximum spend, not require | writing your own fallible code, and provide an option to | preserve storage (at normal cost) so that all your data isn't | deleted when your compute/API stuff is shut down. | asciimike wrote: | Extremely technically, the only GCP product that had this | feature was App Engine Standard v1, but looks like it's | deprecated as of the end of 2019 | (https://cloud.google.com/appengine/docs/managing- | costs#chang...) | NegativeLatency wrote: | Probably hurt revenue ;) | asciimike wrote: | As a former App Engine PM who spent a lot of time with | billing/quotas (though, not the one who deprecated this | feature), it's likely due to some combination of: | | - hard limits caused downtime more often than they | prevent these blog posts | | - hard limits were inconsistently enforced, even within | GAE | | - platform wide quota notifications were implemented | (reached "GA"), leaving the question of "how a developer | wants to handle this" to the developer, not the platform | | - maintenance burden | | The "I bankrupted my startup by running tests in an | infinite loop" blog posts happen ~once a year, while the | number of customers (including internal teams!) who | inadvertently went down because of this quota was | staggering. I feel like I used to see one a week, at | least. Most often someone on the team was like "oh I'm | going to turn this down to zero because we don't want to | spend any money during development", never told anyone, | and then they go live and they forgot to turn the knob | back up (or didn't properly estimate traffic/costs and | set it too low). | | I can tell you it hurts revenue _a lot_ more when a large | customer goes down for 15 minutes due to quota issues and | their usage drops to zero (both in terms of revenue and | customer credibility) vs when tiny developer accidentally | blows through 10k in a month and we refund it (since, | obviously, the providers cost is a lot less than that). | gonzo41 wrote: | I wouldn't be too scared. For AWS you get about $0.20 per 1 | million requests on Lambda. You can do quite a lot with a | single Lambda function. And a million of anything is a lot for | a dev. Put a HTTP API Gateway infront of that with a CDN and | you're hitting ~ a few dollars. | | If you don't buy one coffee, or put a 20 dollar note in a book | one month. Then you're fine. And if you have to use EC2, just | use a t2.micro or a raspberry pi on your desk. | | But really the first lesson you should learn in any cloud setup | is Billing Alarms :) | | If you're doing ML or CV work then it's probably cheaper to | build on the desktop and port to cloud once you understand what | the workloads are. | onion2k wrote: | _For AWS you get about $0.20 per 1 million requests on | Lambda._ | | If you get it right, great. If you get it wrong then you end | up doing billions of operations by mistake, which could cost | a _huge_ amount. That 's what happened to the author of the | article. | | _But really the first lesson you should learn in any cloud | setup is Billing Alarms_ | | Alarms only tell you that something is going wrong. They | don't stop it. If your mistake is costing $1000/minute and | you're an hour away from a computer you have a very expensive | problem. | jjk166 wrote: | So you're taking code that you haven't validated locally to | see what resources it uses, you're putting this up on the | cloud to test it, then you are immediately going to the | middle of nowhere without your laptop/phone/etc, and you | can't arrange for a coworker or friend to pull the plug for | you if something goes wrong? | patrickaljord wrote: | > and you can't arrange for a coworker or friend to pull | the plug for you if something goes wrong? | | This is HN, many of us are solo founders with no | coworkers or employees. Also how could a "friend" pull | the plug? If it was a physical server running in your | house maybe, otherwise you can't really give them access | to your AWS account with all your private clients data in | there. | jjk166 wrote: | If you don't have anybody who can monitor your test, and | you're not monitoring your test, why are you doing a | test? | | As for having a non-employee pull the plug, set up an IAM | user with permission to access the test instance | WrtCdEvrydy wrote: | > you're not monitoring your test, why are you doing a | test? | | Agile. Bringing you bankruptcy at the speed of cloud. | jschwartzi wrote: | If I'm the only developer on a project and I really need | to get to market I might do just that. I sometimes do day | hikes on weeknights so this is actually a likely scenario | for me. | jjk166 wrote: | Do you go hiking alone without your phone? That seems | dangerous. | | And why would you start a test if you won't be there to | see the results of the test? Seems more sensible to | either leave after you've run the test or wait to do so | until you get back. | gonzo41 wrote: | You can trigger events from alarms. And Lambda's only last | 15 minutes. So still cheaper than 75K :D. | gonzo41 wrote: | Just to expand on this. You can have a hard limit. For AWS, | create a role/user that's essentially ~root like access. Make | a lambda function that's triggered by a billing alert at your | threshold to just turn off things from most expensive to | least. So turn of the DB servers. So the apps error out and | the users go away. | bpodgursky wrote: | There are some cloud services where it's not quite this simple. | | S3 -- you can't just delete customer data because they hit a | billing limit | | RDS -- not going to drop databases on the 27th of the month | | Anything with persistent data is going to have to stay alive | and accumulate costs. Admittedly these services aren't where | the crazy bills come from, but it does make a simple kill | switch a bit more complex. | raphaelj wrote: | You don't have to immediately delete customer data. | | Most service that has a limit cap will have a "grace period" | of a couple of days during which the service does not work | but the data is not deleted. That give your some time to get | notified of the issue, and fix the problem/increase the | limit. | heavyset_go wrote: | This is a solved problem for every other service out there. | You don't just delete the data, you give the customer a few | days, weeks, or a month to pay their bill and if they don't, | then you delete their data. | ZephyrBlu wrote: | Probably because it's not so simple on the backend. | | I'm guessing there's a good chance a lot of systems are only | eventually consistent, which could explain why billing takes a | long time to update. | | Aggregation of service usage for billing could also be an | expensive operation, so it's only updated irregularly instead | of being near real-time. | | It would be a great feature, but I can imagine it being very | complex. It's also probably cheaper for them to just wave away | excess usage like this instead of building out a solution. | donmcronald wrote: | Azure has it for some plans [1], but not others like pay-as- | you-go. It seems arbitrary. | | 1. https://azure.microsoft.com/en-us/support/legal/offer- | detail... | a-priori wrote: | This is a billing question, not a technical question, and | looked at through that lens it's easy to put a hard limit on | a monthly bill: just don't ever issue bills greater than that | amount. | | If I say I only want to pay a maximum of $1000 a month, and I | hit that limit but it takes a bit for the provider to shut | everything down so really $1100 of resources were consumed, | then the provider eats the $100 overrun and I get a bill for | $1000. | | With an actual hard limit you create a financial incentive | for the provider to minimize this overrun. Yes it might be | difficult to fix but I assure you, if hard limits existed, | the technical issues would be solved soon enough because now | there's a reason to invest in a solution. | benlivengood wrote: | It's also a mostly solved problem because advertisers have | budgets and it's common to implement globally distributed | budget servers to avoid showing more ads than the | advertiser paid for, despite tens of thousands of | individual web servers needing to know which ads in their | inventory have budget left. | | It's a fun exercise similar to global rate-limiting/load- | balancing. | wikibob wrote: | That is fascinating. | | If you have the time could you (anyone feel free) talk a | bit about how you would implement a globally distributed | budget? | | I can imagine a few simple options, but they all seem to | have significant shortcomings. | kevsim wrote: | I think that's not really an issue though is it? If you say | "never charge me more than $100" they can a) ensure they | never charge you more than $100 and b) work to optimize their | own systems so that they cut you off as close to $100 as | humanly possible. In the beginning they might eat some costs | since it takes them a day to catch it, but they could work | over time to bring that down. And it's not like it's costing | GCP/AWS/Azure "sticker price" to provide their services. | donmcronald wrote: | This is my worst nightmare. Lol. I guess now is a great time to | give Azure a shoutout for sitting on their hands for 8 years | without so much as a response to the community for half a | decade [1]. | | At least AWS allows using a prepaid credit card so they'll need | to call me if things go haywire. I bet if that $72k charge went | through it would have been much harder to get out of. "Sorry, | we don't have the money" is a much better negotiating position | than "can we please have our money back?" | | 1. https://feedback.azure.com/forums/170030-signup-and- | billing/... | dvfjsdhgfv wrote: | > "Sorry, we don't have the money" is a much better | negotiating position than "can we please have our money | back?" | | I agree but why would you like to be in either position | anyway? The so-called cloud services are terribly overpriced | when compared to traditional servers. | treeman79 wrote: | Done correctly they save a lot of IT time. | | Seem companies hire five 6 figure people to try and cut | amazon bill by a couple of grand a month. | | Never understood spending 50-100k a month to maybe save 5k | Retric wrote: | It's often a fixed vs ongoing cost question. Spending | 200k to save 5k per month breaks even in 3.4 years. | | However, for growing companies that 5k/month AWS premium | can hit 200+k/month very quickly | nojito wrote: | Price transparency is the antithesis to the "cloud" and it's | current financial success. | jsiepkes wrote: | Not only development but also running in production. You can | configure alerts but you can't configure a hard limit. Thats | just insane. That makes working with GCP like playing with | fire. | k__ wrote: | What about throttling? | pwinnski wrote: | aka "Bankrupt me more slowly" | | Throttling doesn't stop the drain. | herendin2 wrote: | Nice to have, but people want a throttle that shuts off | dead at a certain number of dollars | serial_dev wrote: | It is baffling why cloud providers don't have that option. | | I might want to have an app because I don't mind spending 50 | dollars on my pet project as a hobby, but I don't ever want to | spend more than that. Not if I write a wrong query that's | suddenly becomes very expensive, not when I got attacked, and | not even when I have legit users. | | By the way, the same goes for some companies, too, just the | threshold would be different. | brundolf wrote: | For hobby projects you probably don't need auto-scaling, and | should use a provider that charges a fixed monthly rate. | You'll "waste" a little bit of money on unused uptime, but | for a hobby project it will be a minuscule amount. | greatgib wrote: | It's not complicated to add configurable hard limits for | these companies but they don't allow it because the current | situation is more interesting for them. | | They want to suck the maximum money from consumers before | they realize. | | For one person that will complain wildly and having to do a | gesture, there are hundreds other companies that will not | notice or just pay without recourse. | onion2k wrote: | _They want to suck the maximum money from consumers before | they realize._ | | I have very little money so I just don't use their services | because a mistake would be disastrous. They might be losing | out on me making a unicorn app on their platform. It's | unlikely, but while the possibility of catastrophe exists | I'll stick to not using them. That extends to not | recommending anyone uses them either in case the worst | happens. | mikestew wrote: | _I have very little money..._ | | Then the harsh reality is: companies don't care. Yeah, | your app might turn out to be a unicorn, but the | overwhelming odds are that it won't. And no one cares | that you'll tell your other broke friends to avoid the | service. | | We'd all like to think it to be different, that a company | might care about appeasing my broke ass. But as already | pointed out, they want the whales. I also wonder, despite | the number of years "cloud services" have been around, if | companies aren't still trying to figure out a gazillion | other things and limiting customer spend might be a bit | low on the priority list. | thanksforthe42 wrote: | Meanwhile this leaves an opportunity for a different | company to provide these services. | | I do my best to avoid FAANG giants who don't think about | me. | renewiltord wrote: | The highly price sensitive customer will force you to | compete only on price. That's just forcing yourself into | a commodity market. It's bad business. I would never try | to cater to that market. Very dangerous. Competition will | drive margins down to near zero. | Paul-ish wrote: | The fact that the dashboards and alerts have a delay sounds | like there might be difficult consistency stuff going on. | Many nodes need to coordinate their usage and billing. It | may be a difficult problem, but solving billing problems | might not really motivate anyone at the company. It's not a | "cool" problem for engineers and not profitable for | product. | phkahler wrote: | >> The fact that the dashboards and alerts have a delay | sounds like there might be difficult consistency stuff | going on. | | I think that's true. It's easier to measure usage and | aggregate that data after the fact than to meter it in | real time and stop at a limit. Those are very different | things. What happens if you hit the cap while running | multiple processes spread across a cloud? | | One improvement might be to throttle things as the cap | approaches but that doesnt really change the problem at | all. Do that and have provider eat any overages should | solve it from the user point of view. | ctvo wrote: | > They want to suck the maximum money from consumers before | they realize. | | This is a naive understanding of how corporations like | Google and Amazon work. Bad will and using gym membership | tactics aren't how they scale or make money. Getting you to | confidently try things knowing you won't get charged (the | reason they have those free tiers) so you'll get your | company, your start-up, your next side project on it is | much better for business. | | It's a miss that things like this aren't implemented and | widespread, not by design. | | > It's not complicated to add configurable hard limits for | these companies but they don't allow it because the current | situation is more interesting for them. | | I'm not in this space, but from my observations: | | - Each service has a different billing model and metering | model. Most likely this data is held by the service. I'm | familiar with AWS so I'll use them as an example. I'd wager | only DynamoDB or only Lambda (the service owners) know how | much of those services you've consumed | | - Billing is most likely reconciled asynchronously after | collecting all data from all services by an entirely | different department with knowledge of payments and | accounting | | - GCP, AWS, Azure launch 50+ services a year | | - Each large customer most likely has a special rate. I bet | Samsung or Snap pay an entirely different set of rates than | the normal customer. There are thousands of these | exceptions | | - Cutting your service off when your over the limit is an | incredibly complex set of edge conditions. Your long | running instance hosting your critical service is shut off | because of experimenting on a new ML workflow? | | Even with only the above I can see the difficulty in | globally limiting your spending limit at an accurate level. | I know there are features for both AWS and GCP and they | try. | | It's easy to stand on the sidelines and handwave away | technical complexity at scale, but I'd encourage you to | give all of these providers a more charitable view, at | least on this topic. | beoberha wrote: | I work in this space and you're absolutely correct. Your | last paragraph hits the nail on the head for pretty much | every complain people have about the public clouds. | patrec wrote: | Right, so let's say Congress passes a bill that requires | cloud providers to enable hard spending limits by start | of February 2021, and eat any extra usage costs that | exceeded a set limit. | | What is your educated guess by when this feature would be | essentially correctly implemented in AWS and GCP | (essentially = negligible costs to the providers due to | either false negatives (bills they eat) and false | positives (PR fallout, when SomeSite gets shutdown | despite not being over limit)? | abawany wrote: | Every time a GCloud rep would ask us about what we need, we | would say: fix the billing interface. As far as I know, it | never got fixed. The feelings I would get when looking at | cloud billing interfaces can be summed as: obfuscated, like | a pawnshop, and caveat emptor. I kind of came to the | conclusion that if the cloud giants are not fixing their | billing interfaces, then just like Amazon not sending you | the details of the items you ordered by email and thus | causing you to use the app to help with primenesia, there | is a 'business' reason why the billing interfaces are | generally incomprehensible. | uoaei wrote: | > It is baffling why cloud providers don't have that option. | | ...is it? If a lazy dev leaves their corporate account open | and you can bill it for their negligence, protected by the | contract you already signed, you earn a lot of money. From a | purely business perspective, it is stupid(!) to provide a | stopgap for that. | | Edit: to be clear I am not advocating one way or the other. | But it is surprising that people are "baffled" by this | obvious profit optimization. | trymas wrote: | AFAIK digitalocean has notification if you go over user defined | limit. | cambalache wrote: | That should be illegal, but hey, at least they support noble | causes, so let them be. It sounds cynic but this is their game. | raphaelj wrote: | It's ever worse for services like AWS Cloudfront. | | One of your competitors could just rent a cheap server on OVH | with uncapped transfer and incur you $10k in cost in a few | hours. | cambalache wrote: | Maybe that it is your cue to move your server from AWS to | OVH* | | * I dont have any idea about OVH | jasonpeacock wrote: | It's surprisingly complex to do that. Let's take a simple | example and say your cloud account is doing 2 things - compute | & storage. | | Compute is an active resource, when you exceed your budget it | can be automatically shutdown. | | Storage is a passive resource, when you exceed your budget it | can be automatically....deleted? That's almost always the wrong | action. | | Providing fine-grained cost limits help some, as passive | resources usually don't have massive cost spikes while active | resources do, so you can better "protect" your passive | resources by setting more aggressive cost limits on the active | resources. | | This quickly gets more complicated. Another example is most | monitoring services are a combination of active (actual metric | monitoring) and passive (metric history) resources. A cost | limit on that monitoring service likely won't provide sub- | service granularity, mostly depending on whether the service | even has different charges for monitoring vs history. | | Oh, also, even for a passive resource like storage, you _also_ | have active resource charges whenever you upload /download your | data. | | Ugh, what a mess. The best thing to do is pay attention to your | spending, just like you do with your personal & corporate | budget. | Closi wrote: | It's almost like you could make it configurable so users can | choose what happens if they go over, and to what extent. | modeless wrote: | It's not really that complex. All compute should shut down. | All API calls should fail. Storage should be (optionally) | preserved at normal cost. | | Your examples are simple given this framework. | Uploading/downloading data to storage is an API call. | Monitoring is compute. Metric history storage is storage. | jasonpeacock wrote: | But storage costs continue to add up even when you're not | accessing them - there's a cost to storage _existing_ which | continues to accrue with time. | | When there's no budget left, what do you do with those | accruing costs for existing storage? | AngusH wrote: | If the amount of storage that you can use is limited by | quota (say 50GB) the problem becomes relatively easier. | | You set a quota for 50GB of storage and no more. The | server then restricts you by disk quota to that amount of | storage. | | The cost is then calculated as 1.15USD per month. | | So you don't pay more than 1.15 per month. | | Compute and transfer (and other things) could be covered | by separate similar quotas with a single maximum spend | figure at the bottom of the table. | modeless wrote: | Storage costs are predictable and slow to accumulate. | They are rarely the problem people are trying to address | when they set a budget. As I said, storage would | _optionally_ continue to be charged at the normal rate, | the other option being immediate deletion if you really | need a super hard budget cap. | | Once you get the alert that your budget is tripped you | can go and see what's in storage via the console and | delete it, only paying for a few hours of storage for | things you don't want. | asciimike wrote: | Moreover, once API calls are locked, what next? You can't | delete files, and even if you can delete them, you aren't | able to retrieve them before deletion... If a platform | allows you to do those actions, then it's rife for abuse, | and at public cloud scale that ends up being a far, far | bigger problem than the occasional blog post that ends up | as a refund (because the other blog post is "I got free | storage forever with this one weird trick"). | | It's really not a simple problem because the next action | depends on the choice the developer wants to make: do | they increase the budget or decrease usage, and no cloud | provider wants to make this choice because no matter what | the choice is it will be viewed as wrong. The best they | can do is provide developers the best insight and tooling | to make this choice themselves. | modeless wrote: | Once API calls are locked you can open the console, | disable all the things that caused you to hit your | budget, and then raise the budget a bit to get access to | the storage APIs again and manage your storage. Or, the | console's storage browser should let you browse and | delete files as well. And again, there should be an | option to delete all storage immediately for a hard cap | on your budget if you really want that. | AngusH wrote: | You need separate costed quotas for each type of activity | with a combined total at the bottom. | | You could also have a setting in the admin panel as to | what the system should do: | | [ ] I want to keep going beyond my quotas (but email me) | | [ ] Please shutdown my site | asciimike wrote: | If the answer is "you have a dollar limit set of GCS | GETs, GCS PUTs, etc." I guess I could see this working, | but hot damn that'll be a horrific interface. | | The other issue is that many large customers pay | different prices, so billing and quota aren't really tied | to each other, and it wouldn't be easy to reconcile this. | | As for the button... having been on the product side of | building this button, there is no right answer: people | will say they never got the email (or it went to the | wrong inbox, or their dog ate their phone...) or that | they never checked the box to "shut down the site" ("I | didn't think it would do X that made my app not work"). | AngusH wrote: | I'd probably want it grouped by category with a drill | down interface for the specifics. | | Probably arranged so you can type in a figure at the | bottom for monthly expenditure and it would balance out | the requirements based on typical use cases. | | So enter $50 in the monthly cap figure and it allocates, | say, $20 to compute, $20 to transfer operations and API | calls, $10 to storage | | which you could then fiddle with of course. | | I can't offer much on the second point other than to say | that unexpected bills annoy me much more than services | that stop working. | | I've also never worked anywhere with unlimited budgets. | (alas) | | I can see that there are probably cases where uptime is | more important so they would be more annoyed the other | way around. | cesarb wrote: | > Storage is a passive resource, when you exceed your budget | it can be automatically....deleted? That's almost always the | wrong action. | | A better option would be to automatically reduce the budget | by the amount it would cost to keep the storage forever. If | doing that would reduce the budget to zero, do not allow | increasing the amount of storage. That is: assume the storage | will not be deleted, and budget according to that. | asciimike wrote: | How does this actually work? It clearly can't be forever, | since any non-zero dollar amount * infinity months is | infinity dollars, which is going to reduce the budget below | zero since any non-infinite number minus infinity is less | than zero... thus locking it immediately. | | Even if we say "you get N months of storage before we | delete it" and subtract N * current storage cost/month, | what happens after you're locked out of all actions because | you added an extra GB? Storage APIs cost money to use, so | you would get locked out of those too (note that if you're | not, people would set arbitrarily low limits and get | storage access for free) and couldn't retrieve anything. | The only remaining actions are delete (which is free) or | raise the quota and do the whole rodeo over again. | | Abuse is impossible to ignore at public cloud scale, so | "free storage forever" (or even, storage at a one time | fixed price) as the fallback isn't a viable option. | | Lastly, from an optics perspective, which blog post would | you rather see on the front page of HN: "I did something | dumb and spent too much money on Cloud" or "Google is | holding our data hostage" (or "Google deleted all my | data")? | | Source: I launched Firebase Storage, which has a GCS bucket | that has a hard limit. | MichaelBurge wrote: | "infinity" is way too conservative: If it costs $1 to | maintain, and you can get 10% interest loaning money out | at any BDC, then $10 will maintain it forever. | AngusH wrote: | But we've had disk quotas before that mostly worked? | | If anything it seems an easier problem than processor time. | | I recall disk quotas on shared systems at university back in | 1998 and I'm sure they existed before that. | | Two thresholds IIRC, one at which you get a warning, second | at which you can't write any further and the disk write | operation fails. | | I don't think they deleted files, it was just you couldn't | write more than [quota] bytes to your disk. | | Is there something particular about cloud based systems that | prevent this from working? | | ie. is this a specific problem with distributed storage? | | edit:tone | robrtsql wrote: | S3 costs money to keep your files in, even if you're not | touching them, so just preventing further uploads wouldn't | do much to prevent your AWS bill from increasing. | mlboss wrote: | I don't understand why developers use cloud for | bootstrapping/side project. Digital Ocean is all you need $5 | droplet + $15 Postgres or even better $7 dyno on Heroku. | WrtCdEvrydy wrote: | $5 droplet, 2GB swap (set it internally) and run Caprover. | | Deploy your Postgres (for DB), minio (for s3 storage) and your | webapp from Caprover. Add nodes as you need to scale out. | com2kid wrote: | I built my startup on a combo of DO and Firebase. | | If I knew something was going to take more than a couple dozen | milliseconds to run, it was built on the DO droplet. | | Why would I pay by the CPU second for something that is taking | a lot of CPU seconds? That billing model doesn't make sense. | | For my super quick REST endpoints, yeah, all on Firebase, the | convenience of writing + deploying makes it an obvious win. | (Unless something goes wrong, debugging Firebase functions is | not fun...) | Alupis wrote: | As J. Paul Getty once mused[1]: | | > If you owe the bank $100 that's your problem. If you owe the | bank $100 million, that's the bank's problem. | | Crappy situation for OP and his startup, but I find the part | about reading up on bankruptcy to be a bit premature. | | Perhaps not the most ethical choice, but what stops OP from just | not paying the bill, and finding a different cloud provider? | Obviously they'll want to not repeat the "experiment", but | seriously... there's no mechanism at Google to stop a new client | from running up a near-$100k bill in a single day? | | That's absurd, and should be a learning lesson for Google more | than this startup. Some malicious actor could apparently consume | hundreds of thousands of dollars of Google resources and "get | away" with it. | | Wait and see what happens, then deal with it - would be sane | advice. | | [1] https://www.brainyquote.com/quotes/j_paul_getty_129274 | kabirgoel wrote: | One of my favorite quotes of all time. J. Paul Getty was quite | the weirdo. His Wikipedia article is worth a look, especially | the section on his frugality. | xyzzyz wrote: | As an interesting coincidence, large part of Google Cloud | organization resides in a building that was formerly | headquarters of Getty Images, a company founded by Mark | Getty, a grandson of J. Paul Getty. | pontifier wrote: | Lol. I love it. I moved to a state I'd never considered | because it had the largest, cheapest building in the US. | | It's 220,000 square feet, but I've lived in a tent out back | for the last 6 months because I can't get an occupancy | permit, it's not zoned residential, and I refuse to pay rent | on an apartment. | sterlind wrote: | You live in a 220,000 square foot building?! Is this an | abandoned missile silo or something? I want to know more. | dumpsterdiver wrote: | > It's 220,000 square feet | | Is it an old airplane hangar? | sudcha wrote: | OP here. | | Bankruptcy fear was real at the time. Google has at least a few | thousand lawyers on payroll. They probably also have a process | of handling delinquencies and sending them notices. A quick | look at the lawyer fee to just manage the case, let alone fight | it, is enough for bootstrapped company to raise hands. | | +1 to bad actors possibility. I shared this with Google team, | I'm not sure what they have done since. | | We are out of that situation and I wrote the post so others, | relatively new to Cloud don't make same mistakes. | | Fail fast is a very bad idea with Cloud. | Alupis wrote: | All true, and good points you raise. | | However, Google's army of lawyers costs them real money, | where your bill is largely made up numbers. | | Perhaps the true cost is still enough to warrant sic'ing | their lawyers on your company. | | Even in that situation, a wait-and-see approach is still | pretty advisable. The worst case scenario was already known | to you - bankruptcy. | | Nothing Google or their lawyers do would change that worst- | case outcome, and if Google was aware you literally don't | have $72k, and might just declare bankruptcy and walk away, | they'll be much more eager to negotiate a more reasonable | bill and settle your account. It's exactly as J. Paul Getty | said... | | Very glad it's being worked out and you will not have to go | down that path. ___________________________________________________________________ (page generated 2020-12-10 23:01 UTC)