       | steren wrote:
       | (Cloud Run PM here) I am sorry for the experience described in
       | the blog post, we could definitely be better at bill management.
       | I am glad that it worked out in the end and the customer was not
       | required to pay for the bill.
       | Based on this experience, we decided to lower the default value
       | of "max instances" to 100 for future deployments. We believe 100
       | is a better trade off between allowing customers to scale out and
       | preventing big billing surprises. Of course, customers can always
       | decrease it or increase it up to 1,000, or even above with a
       | simple quota increase request.
       | bharatsb wrote:
       | Part 2: https://blog.tomilkieway.com/72k-2/
         | MaxBarraclough wrote:
         | > Google let go of our bill as a one time gesture!
         | We've seen this happen with similar stories on AWS. Neither
         | platform supports prepayment with a hard limit on costs, and
         | this seems unlikely to change.
           | teekert wrote:
           | Yeah a friend of mine wanted a real cert, not letsencrypt (I
           | don't understand how that is more real but ok), as a bit of a
           | noob he clicked around on the AWS website and some days later
           | had a bill op 1500 eur. They also nulled it. Still, this
           | scares the hell out of me.
             | WrtCdEvrydy wrote:
             | AWS Certificate Manager gives you free non-extended SSL to
             | your machine, it's pretty nifty.
           | coddle-hark wrote:
           | I can sympathise with some of these stories, like the ones
           | where an overnight DDOS attack racks up a huge unexpected
           | bill, but this one in particular is just a story of gross
           | incompetence and negligence. The guy hacked together some
           | code in a few days and deployed it to a service with
           | unlimited billing without any kind of sanity checks and
           | without even understanding what he was paying for. He's an
           | ex-Googler, it's not like he hasn't heard stories like this
           | before. And the takeaway? "Oops don't deploy buggy code" and
           | "I shouldn't have used the default settings". OK, sure, let
           | me know how that works out for you.
       | jjk166 wrote:
       | > I jumped out of the bed, logged into Google Cloud Billing, and
       | saw a bill for ~$5,000. Super stressed, and not sure what
       | happened, I clicked around, trying to figure out what was
       | happening. I also started thinking of what may have happened, and
       | how we could "possibly" pay the $5K bill.
       | > The problem was, every minute the bill kept going up.
       | > After 5 minutes, the bill read $15,000, in 20 mins, it said
       | $25,000. I wasn't sure where it would stop. Perhaps it won't
       | stop?
       | > After two hours, it settled at a little short of $72,000.
       | > By this time, my team and I were on a call, I was in a state of
       | complete shock and had absolutely no clue about what we would do
       | next. We disabled billing, closed all services.
       | 1) Why wouldn't you shut off the service as soon as you saw the
       | $5000 bill? Really doesn't sound like a "hop on a call with the
       | team for a few hours" kind of decision.
       | 2) Why was the person taking a nap the only person who could get
       | a usage limit alert? One of the great benefits of a team is that
       | you can have multiple eyes looking out for problems. Someone
       | could have raised a flag as soon as the first unexpected alert
       | came in.
       | 3) If going over the free tier limit was your chief concern, why
       | not check the usage after a quick run before letting it go
       | overnight and unsupervised?
       | That the problem could get this bad is a UX failure, but the
       | problem itself is easily seeable and avoidable.
         | yawnxyz wrote:
         | I think the article said the Firebase dashboard data was at
         | least 24 hours behind what they were getting billed
       | villgax wrote:
       | I'd happily use a platform that allowed for an option for
       | limiting billing on a daily/monthly/another metric.
         | raphaelj wrote:
         | Some platforms do that: - Heroku costs are pretty predicable,
         | and you can easily set a maximum scalability threshold to their
         | auto-scalable dynos, so that they will never cost you more than
         | a predefined amount of money; - BunnyCDN requires me to top-up
         | their prepaid account, so that I'll never spend more than what
         | I have on that account.
       | atian wrote:
       | You can ask for a good faith billing adjustment. GCE or AWS is
       | well aware that things happen, and collecting something is better
       | than collecting nothing.
         | ceejayoz wrote:
         | Part two of the article indicates they did that, and had the
         | bill nulled out.
           | pettycashstash2 wrote:
           | from part 2:
           | "After going through our lengthy doc on this incident sharing
           | our side of the story, various consults, talks, and internal
           | discussions Google let go of our bill as a one time gesture!"
       | throwaway7281 wrote:
       | To put it into perspective: You give me $72K and I'll set you up
       | a 1PB replicated storage infra with a total of 100+ available CPU
       | cores and half a TB RAM.
       | I saw people burning through cash in the cloud, which makes you
       | wonder weather money is any concern at all.
         | iooi wrote:
         | Because electricity is free. And internet is also free. And the
         | rooms to put the servers are also free. A/C is free. And backup
         | generators are free. And diesel is free.
           | throwaway7281 wrote:
           | I was stretching the point, but what amazes me is the amount
           | of stuff people want to do vs. the amount of equipment they
           | throw at the problem.
           | ludocode wrote:
           | You can rent a full 42U rack in a colocation center for
           | ~$1500/mo easily. They'll handle all of that stuff, including
           | redundant power and redundant internet.
           | Of course self-hosting on real hardware is not quite as
           | simple or cheap as GP made it out to be. But everything in
           | your post can be solved with simple fixed pricing, which is
           | still the main point: there are no dangers of wildly variable
           | pricing or accidental massive bills as there are with cloud
           | hosting providers.
         | LeonM wrote:
         | You forget your own cost here.
         | A full-time system administrator costs more than 72k a year.
           | rndgermandude wrote:
           | Learning/Administering AWS/GCP/Azure costs time and therefore
           | money too. Maybe less money, maybe more money than doing
           | things yourself, depending on what you're doing. But you
           | shouldn't disregard such costs.
           | I've seen enough buddies spending enormous amounts of time
           | doing AWS devops on top of paying the AWS premium when they
           | could have gotten away easily with a less than a handful of
           | VPS (+ optionally $100/month worth of cloudflare as a CDN).
           | donmcronald wrote:
           | I wonder what a full time cloud engineer costs. IMO it's
           | trading a simple system for a complex system, so now the
           | maintainers cost even more than sysadmins used to.
           | walrus01 wrote:
           | Once an organization reaches a certain size it will need one,
           | who ideally should be a person that can wear the dual hats of
           | linux/bsd sysadmin and also network engineer.
           | If the person is already on payroll doing a number of other
           | duties, the time/effort to set up such an environment as
           | described in the post could be as short as a couple of days
           | work.
           | nlitened wrote:
           | Yeah, but the absence of system administrator has just cost
           | these guys 72k for several hours.
             | LeonM wrote:
             | I'm just trying to explain that server cost is more than
             | just the hardware.
             | In most cases cloud computing is actually still a very cost
             | effective solution to infrastructure. But with infinite
             | scalability also comes responsibility.
             | In the case of the OP, had they had their own hardware they
             | would have noticed that had written bad code (it would have
             | crashed or become very slow at least), but the cloud just
             | scaled up and processed their code.
             | I'm not trying to defend Google in this case. Billing 72k
             | when a 100 USD limit is set sounds like a scam.
           | gruturo wrote:
           | And for half the use cases you still need one. Unless you go
           | full SaaS (which may or may not be an option depending what
           | you are doing, and what's on offer in that field), you still
           | get stuff which needs to be administered, updated, patched,
           | etc. Maybe not the OS layer (or maybe even that), maybe not
           | the DB (but then it might cost more), but you're not getting
           | away from that.
           | You only really start saving at some scale (get a small core
           | of cloud-literate admins, and now you can have them run
           | thousands of systems for effectively no incremental cost)
           | jcelerier wrote:
           | Aha what ? Where I live (Bordeaux, France) a quick glance
           | through the job offers for full-time sysadmin are between 25
           | and 35kEUR / year
             | MeinBlutIstBlau wrote:
             | Jesus no wonder many IT professionals in europe want to
             | come here.
             | KptMarchewa wrote:
             | You can get 2x more in Poland.
             | klohto wrote:
             | sysadmin isn't infrastructure engineer nor SRE. The
             | salaries for latter are way above 74k/year
             | LeonM wrote:
             | You confuse cost with salary.
             | Someone who makes 35k a year costs much more. Think about
             | office space, training, insurance, payroll costs etc.
               | jcelerier wrote:
               | 35k is the gross which encompasses a big part of that
               | except office space.
               | sofixa wrote:
               | No it's not. In France companies pay ~1.5-2x gross salary
               | in total, gross going to the employee ( some of it
               | getting deducted by the state), rest going for health
               | insurance, taxes, etc.
             | chefkoch wrote:
             | France seems to pay IT quite bad. In Germany nobody would
             | work for this in metro area.
           | zigzag312 wrote:
           | Why full-time? It's possible to outsource IT administration
           | to a local IT company and pay only for set-up and maintenance
           | that is needed. Way less than 72k a year for many use cases.
           | Also, companies that employ bunch of developers can find a
           | developer that has IT administration expertise and allocate
           | some of his time to this. Still cheaper than 72k a year or
           | employing someone full-time, if IT requirements don't call
           | for full time job.
           | It's not just cloud vs full-time.
             | mike_ivanov wrote:
             | Exactly. Or just buy a managed dedicated server - it's more
             | expensive, but still it's a fraction of the full time
             | sysadmin cost, and much cheaper than AWS.
           | raverbashing wrote:
           | Which is moot if you're just going to burn the 72k by
           | shooting yourself in the foot.
           | And the way these cloud services go, the 72k was only
           | detected because it was an one-off event. Turn that into a
           | base-level inefficiency that costs that over a year and what
           | have you then.
         | walrus01 wrote:
         | only a half TB of RAM? somebody recently _gave_ me a free 4U
         | server with 256GB of RAM in it. for zero dollars.
         | If you need a number or xen or kvm VMs with a lot of RAM
         | assigned to each one for testing something, you can fairly
         | easily set up an older Dell R910 (quad-socket system) with
         | 512GB of RAM for under $2000.
       | svrtknst wrote:
       | cloud providers are scam artist, dont @ me
       | rawgabbit wrote:
       | Meh. I see this all the time with developers who want to abstract
       | everything away and not worry about the impact their poorly
       | performing code is having on the infrastructure or on the
       | $$bottom line. Time out? No problem, we will just spin up more
       | instances. I have heard that so many times. Maybe your code is
       | just bad.
       | tlarkworthy wrote:
       | And that's why you set the 'autoscaling.knative.dev/maxScale' for
       | cloud run https://github.com/futurice/terraform-
       | examples/blob/2ccb2fa3...
         | YawningAngel wrote:
         | A sane implementation would default this to a low value
           | steren wrote:
           | Cloud Run PM here: I'm sorry for the bad experience the
           | customer shared in this article, we could certainly do better
           | with bill management.
           | We pick 1,000 as a default value for "maxScale", this can be
           | considered high for some users, but low for users who expect
           | infinite scaling from the service and start with a load test
           | to evaluate it.
             | heavyset_go wrote:
             | Given that this is a common problem, and one that can
             | bankrupt individuals or their businesses, when is AWS going
             | to implement spending caps that are easy to set up for new
             | developers or business owners?
               | jontro wrote:
               | Still the really expensive thing here was the datastore
               | reads. Cpu time was only 10% of the bill
             | quesera wrote:
             | > We pick 1,000 as a default value for "maxScale", this can
             | be considered high for some users, but low for users who
             | expect infinite scaling from the service and start with a
             | load test to evaluate it.
             | That seems absurd to me.
             | I think it makes much more sense to put the onus on the
             | _sophisticated customer_ to increase their maxScale to an
             | unusual value. Users who  "expect infinite scaling...and
             | start with a load test" are sophisticated users.
             | E.g. set maxScale low, like 2 or 4. The sophisticated
             | customer would recognize their oversight quickly. Click-
             | click, fixed, restart test.
             | Effectively 100% of less-sophisticated customers will not
             | need enormous scale on day 1. Customers with whom you do
             | not have an existing billing relationship in the 10s of
             | thousands of dollars per cycle will almost certainly not
             | want it.
             | I'd consider that level of overspecification to be a strong
             | anti-pattern.
               | steren wrote:
               | My response at
               | https://news.ycombinator.com/item?id=25379846
       | asciimike wrote:
       | Every time I see another post like this I always wonder how many
       | people would be willing to buy "cloud insurance" where a premium
       | would cover overages due to your mistakes in dev, outages, etc.
       | I don't have an exact billing model worked out, and I assume the
       | insurance provider would mandate certain practices (e.g. setting
       | up billing alerts that go to them, allowing them to view/manage
       | infra), but curious if people here would be willing to pay for
       | such a thing.
       | (My assumption is that most people who fall into this are too
       | small to be willing to pay a reasonable % of their infra spend or
       | change their infra practices to prevent this, but I'd be curious
       | if this is something CIOs of companies who are thinking of
       | "moving to the cloud but are leery due to cost concerns").
       | MattyMc wrote:
       | I know it's not cool these days, but I strongly prefer (and
       | advise) fully-managed cloud services like Heroku. I can fix my
       | database size and scale/resources (dynos) easily. It's simple,
       | and controlled.
         | qayxc wrote:
         | Or just test on a good old VM, which can be had for just a few
         | cents per hour and doesn't even allow for storage or network
         | traffic going out of hand.
         | The first mistake is to deploy tests on completely opaque
         | hyper-scalers. Pretty much any software infrastructure - from
         | (SQL/No-SQL/In-Memory/etc.-) databases to entire web-frameworks
         | can be found as ready-to-go VM images and containers these
         | days.
         | Sure, it's a bit more work to find and setup, but in the end
         | you gain an understanding what system is actually doing, how it
         | might behave and the ability to deploy into any environment -
         | from local workstations to bare metal to (a fleet of) VMs to
         | high-level hyper-scaler services.
       | walrus01 wrote:
       | writing as somebody who runs a big collection of bare-metal
       | hypervisors for ISP infrastructure purposes... this post quite
       | honestly just makes me smirk.
       | I have truly lost track of the numerous instances, and number of
       | people who would be better served by buying a $1200
       | test/development 1U dual socket server with a few fast SSDs in
       | it, and putting it in colocation somewhere for a few hundred
       | dollars a month. The costs would be absolutely fixed and known.
       | On a tight budget? You don't even need to go as far as $1200, I
       | see totally fine test/development environment suitable, Dell 1U
       | servers on eBay right now for under $500 with 128GB of RAM.
       | Or that would be better off purchasing a fixed-configuration
       | virtual machine (typically running on xen or kvm underneath) that
       | has a certain specific amount of CPU, RAM and storage resources
       | allocated to it which cannot balloon. For a fixed bill per month
       | like $65 or $85.
       | You want to deploy your weird app on some cloud platform? sure,
       | go for it, once you've got the possible scaling-up cost issues
       | and possible bugs worked out on your own platform.
         | MattyMc wrote:
         | > writing as somebody who runs a big collection of bare-metal
         | hypervisors for ISP infrastructure purposes.
         | I run a cloud SaaS company (3 employees). If I had the skillset
         | that it sounds like you do, I might be inclined to hose on bare
         | metal. But I don't. I don't know what a 1U dual socket server
         | is.
         | It would take me some time to build these skills, and to match
         | the agility that the cloud offers. I don't think it's worth my
         | time, and probably not the author's time, either.
           | walrus01 wrote:
           | absolutely an understandable concern. One way of abstracting
           | away the need to own or maintain physical servers while still
           | achieving a definitive fixed monthly cost is to do as this
           | other commenter has done, renting dedicated servers from a
           | company that specializes in such:
           | https://news.ycombinator.com/item?id=25372912
         | salmonlogs wrote:
         | And this comment makes you look incredibly naive and narrow
         | minded.
         | Running some code on a CPU != running a startup. Great you can
         | buy a Dell server on eBay, or you can build a powerful desktop,
         | or rent a VM or get a droplet or scrape on lowendbox. These are
         | not a secret and there is a great reason no one does this other
         | than hobbyists and neckbeards.
         | You do your testing and it works, then what? You have to
         | deliver scalable reliable systems in production that require
         | identity management, security, backups, resiliency,
         | reliability, various networking services and a million other
         | supporting services and all the systems that come with it.
         | Never-mind actually scaling the application, monitoring it and
         | all the tools, systems and processes needed to run reliable
         | systems in production.
         | The eBay servers provide you exactly zero of that and you've
         | just wasted time setting up an environment that is a snowflake
         | and doesn't represent reality. Testing on the cloud on exactly
         | the same platform you would use for production has a lot of
         | benefits when you look at value as limited developer time
         | delivering value to customers and the business.
         | Whilst the $1200 server on eBay might be cheap today, you are
         | entirely missing the hidden cost of lost time when your team of
         | developers costing $M/year are wasting on testing in an
         | environment that doesn't help them find and solve production
         | issues. You don't need many hours of wasted time or downtime to
         | lose all of your so called cost gains.
         | Optimising for absolute minimum cost is a fools errand that
         | only slows down actually delivering production systems that
         | deliver value to your customers.
         | Please spend some time thinking bigger about the opportunity
         | cost and value delivery of technology beyond the immediate
         | dollars and cents - it might surprise you.
         | dang wrote:
         | Please don't be a jerk on HN, especially in response to someone
         | else's misfortune, even if they brought it on themselves. Maybe
         | you don't need to treat these people better (though why not?)
         | but you owe the community better if you're posting here. If you
         | wouldn't mind reviewing the site guidelines and taking the
         | intended spirit to heart, we'd be grateful. Note these ones: "
         | _Be kind_ " and " _Please don 't sneer_"
         | https://news.ycombinator.com/newsguidelines.html
         | p.s. I skimmed through your recent commenting history and it
         | looks great--just the kind of thing we want here. Sharing some
         | of what you know is exactly what we want users to do. But
         | please don't be supercilious about it, as in this comment and
         | https://news.ycombinator.com/item?id=25372847. Ignorance
         | doesn't deserve humiliation, and that ingredient poisons the
         | ecosystem (and eventually starts a degrading spiral, e.g.
         | https://news.ycombinator.com/item?id=25373520). The rest is
         | good.
           | Alex3917 wrote:
           | > Maybe you don't need to treat these people better (though
           | why not?)
           | IMHO the best argument for 'why not' would be that it's
           | generally unethical to deploy software without first taking
           | the time to read the manual and understand how your
           | dependencies work. In this case the system wasn't live and
           | the costs of this fuckup were solely externalized onto
           | Google, which is fine because it was in large part their
           | fault anyway. But when dealing with production deployments,
           | this same behavior often results in users having all their
           | private information leaked or deleted.
             | walrus01 wrote:
             | I think cautionary tale are important - but it's also
             | possible, as I likely did above, to come down on people too
             | harshly. Not everything has consequences as severe as a
             | therac-25.
           | walrus01 wrote:
           | Thanks for the feedback. I almost certainly shouldn't have
           | included the part about the smirk, and I can definitely see
           | how that could appear to be making fun of somebody else's
           | misfortune. And the rest of it could have been phrased in a
           | more diplomatic way.
           | For what it's worth it wasn't intended personally at the
           | _person_ who almost incurred the $72k bill, but more at the
           | general concept of test /beta software gone rampant and out
           | of control in an environment where billing has no limits. I
           | think we've all tested some sort of software in development
           | environments that caused havoc - but up until very recently
           | it's been hard for that to immediately begin causing real
           | world financial consequences...
         | craftinator wrote:
         | Prick.
           | dang wrote:
           | Would you please stop posting unsubstantive comments to HN
           | and stop breaking the site guidelines? You've been doing it a
           | lot and we ban that sort of account. I don't want to ban you
           | because your good comments are good, but the bad comments are
           | like mercury: they build up in the system and poison things.
           | The rules apply regardless of how bad or wrong another
           | comment is, or you feel it is.
           | https://news.ycombinator.com/newsguidelines.html
         | skrebbel wrote:
         | > The costs would be absolutely fixed and known.
         | Until the server breaks and you have to drive over in the
         | middle of the night and try to replace it but the only
         | available server _right now_ is a shitty one and oh shit only
         | half the backups work cause the onsite backups are fried too
         | etc etc etc.
         | There's many good arguments against high-level BaaS such as
         | Firebase but I'm not sure that "colo is cheaper" is one.
           | walrus01 wrote:
           | fine, buy two identical ones. and set up proper backups. or
           | even make the backup a hot-spare.
           | if it's a test/development system that's meant to possibly
           | break, you shouldn't be driving anywhere at 0300 in the
           | morning anyways.
             | wrkronmiller wrote:
             | That implies you are going to be running prod in the cloud.
             | Unless you're developing against purely synthetic data, the
             | data transfer costs are potentially astronomical.
           | daneel_w wrote:
           | It's anecdotal, but I'm convinced I'm not alone here: we've
           | had more Amazon-related failures/outages in 3 years with AWS
           | than we had in 4 years of colo before heading to the cloud
           | because of the exact fear you described.
           | Even a cloud setup needs good management and contingency
           | planning, and in absence of such it can fail just as hard as
           | a colo setup.
             | walrus01 wrote:
             | https://www.bbc.com/news/technology-55087054
             | https://www.seattletimes.com/business/amazon/amazon-web-
             | serv...
           | z3t4 wrote:
           | Just because you use "the cloud" doesn't mean you don't need
           | backups. "the cloud" also have downtime and other failure.
           | When deploying to the cloud you have to factor in the cost
           | for moving to another provider if/when it will be needed.
           | senko wrote:
           | > I'm not sure that "colo is cheaper" is one.
           | It absolutely is (cheaper, and a good argument). As an
           | example: we're in the process of switching from Digital Ocean
           | to Hetzner for a project, that will increase infrastructure
           | performance (roughly memory/cpu/storage) by 4x and decrease
           | costs by 4x. And no driving to the colo center is neccessary,
           | as it's their dedicated server, so their on-site engineers do
           | the hardware replacement.
           | Also, if you are not okay with your site being down for a few
           | hours, you can always buy two, like you would with a sensible
           | cloud setup. It'd still come up way cheaper (+ get more perf
           | if you can do load balancing for your usage).
           | Also, I don't look at it from "colo is cheaper" point of
           | view. To me, it's "I can have several times more performance
           | _and_ hire a full time sysadmin to worry about it, for the
           | same price ".
             | Axsuul wrote:
             | Can anyone recommend a similar provider but for North
             | America?
         | sofixa wrote:
         | Oh please, you can't be serious?
         | First, that $1200 server costs fixed money upfront, and then
         | you pay per month for colo, and for internet, which usually
         | includes a fixed bandwith cap or limits, with bursts which you
         | pay for. So no, it's not fixed.
         | Second, a server you have to maintain, harwdare and software-
         | wise, is much more complex, and takes much more time, than a
         | managed service. You want a database? Install it yourself,
         | maintain it yourself, backup it yourself, monitor it yourself.
         | And same with everything else.
         | Third, there's zero redundancy in your "setup". If you want it
         | with the most basic redundancy, you triple the costs (second
         | server, extra networking equipment, etc.).
         | Fourth, geo redundancy/distributedness? Please. Good luck if
         | you have someone far away who wants to visit your site.
         | Fifth, let's say you need to scale. Like, you get 10 more users
         | today than you did yesterday, or you get featured on HN or
         | Reddit or local news or whatever. F. You're looking at months
         | and a lot of cash, upfront.
         | "A big collection of bare-metal hypervisors" makes sense in
         | some cases, but don't pretend it doesn't come with a non-
         | negligible time spent maintaning it and requires significant
         | upfront capital and man hours to do the same you get easily on
         | a public cloud platform (databases, message brokers, object
         | storage, etc. etc . etc. etc. etc. etc.).
           | walrus01 wrote:
           | yes, I am serious, because as described in the original post
           | this was somebody's _test /prototype environment_. Which is
           | the ideal use case for a DIY scenario, until you're ready to
           | send things into production.
           | I have seen people spend thousands of dollars on a cloud
           | hosting platform to develop and test something when it could
           | have been done equally well on a 4-year-old desktop PC
           | sitting on somebody's desk. If they had only thought to
           | bother installing the same (debian, centos, whatever)
           | environment + packages + custom configuration on it.
         | JimDabell wrote:
         | > this post quite honestly just makes me smirk.
         | It seems pretty callous to laugh at somebody else's $72K
         | misfortune, especially as they took reasonable steps to set a
         | budget on the platform.
           | walrus01 wrote:
           | From my point of view after doing this for 20 years, it's
           | like seeing the past 12 years of the "put everything in the
           | cloud" era, of new different people repeating exactly the
           | same mistake over and over again.
           | It's like if you lived near a public park with particularly
           | aggressive geese that return every year, and watched new
           | ignorant groups of people get chased by the geese every
           | spring.
           | It's not callous - it's the perspective of the people who are
           | responsible for the hypervisors that run underneath the VMs
           | and services that cause some of these massive billing
           | outrages.
             | jedimastert wrote:
             | > It's like if you lived near a public park with
             | particularly aggressive geese that return every year, and
             | watched new ignorant groups of people get chased by the
             | geese every spring.
             | You're not really helping your argument here. Particularly
             | if people have been attacked for over a decade and no one
             | has put up a "aggressive geese" sign
               | walrus01 wrote:
               | Quite literally in my specific area, the former is true,
               | and also it's a fact the city government has put up a
               | number of signs in the nesting area. Still happens.
             | scarygliders wrote:
             | I quite agree with everything you've said in this and your
             | other post.
             | My development environment? : My own dual-booting
             | Windows/Linux PC with 32G RAM and a few TB of SSD. Not to
             | mention the Nvidia RTX graphics card for gaming...
             | I either spin up a VM to test stuff, or spin up a Python
             | virtualenv. Postgresql also running on this machine.
             | Whatever's needed. Need to emulate Stuff Happening From
             | Different Servers? Why just spin up another few VM's -
             | assign them the minimum resources required to get them
             | doing what they need to do, set up your VM network etc. Any
             | decently specced desktop machine can do that, never mind a
             | noisy rack system - considering they're way better and
             | vastly more powerful then the PC's we had 2-5 years before
             | that, which themselves were vastly more powerful than the
             | ones before them, and so on...
             | Result? Can develop at home to my heart's content, then
             | when it comes to deployment spin up a remote VM on e.g.
             | DigitalOcean and take it from there.
             | At the end of the day, "sErVeRlEsS" (I just don't like that
             | term, for some reason it rubs me up the wrong way, perhaps
             | because of...) just means "running stuff on someone else's
             | kit" - the same as "tHe ClOuD", so if I'm going to be
             | developing some system & software, I'd rather be doing it
             | locally, setting up whatever's needed to get it running,
             | and once satisfied, deploying it.
             | Like you, I see either the same people, or new people,
             | simply Not Learning From The Past. There are many good
             | reasons why things were done like they were - developing on
             | a system you own, for example, rather than spinning up all
             | sorts of Cloudy Things or "serverlessy things" right from
             | the start.
             | Hardware is cheap - you don't need a supercomputer to run
             | the beginnings of your latest Supah Scalable System[tm],
             | you just develop and run it on a reasonably up to date box,
             | and, sure, when you get to the stage where you need more
             | space/bandwidth/whatever, that's the point where you deploy
             | to some Cloudy Thing or SeRvErLeSs Thing.
               | walrus01 wrote:
               | My personal home office development environment at the
               | moment, done on an ultra low budget, is a dell precision
               | t5600 mid tower workstation PC (dual xeon, e5-2630) that
               | I got for $350 with 64GB of RAM in it, upgraded it to
               | 128GB, and put a $150 Samsung SATA3 SSD in. It's small
               | and relatively quiet and sits under my desk tucked in a
               | back corner with just a power cable and a few ethernet
               | cables plugged into it.
               | Maybe some time in the near future I'll add a 2TB HDD
               | that I have sitting around into it so that I can create
               | VMs that have a 'fast' boot/root disk, and also give them
               | some lvm partitions on a big slow disk.
               | It's running debian stable amd64 and is set up as a xen
               | dom0 hypervisor, with 768MB of RAM assigned to the dom0
               | and the rest available for VMs.
               | The amount of capacity that's available there to create
               | random PV or HVM VMs with as much RAM as I could want, is
               | more than sufficient for my personal needs. If I need
               | anything bigger I'll make it a more formal process and
               | put it on a machine at work.
         | onion2k wrote:
         | _Dell 1U servers on eBay right now for under $500 with 128GB of
         | RAM_
         | In part 2 the author says "Had we chosen max-instances to be
         | "2", our costs would've been 500 times less. $72,000 bill
         | would've been: $144". In other words, that $500 server is
         | several times more expensive than it would have been if
         | Firebase and GCP had saner defaults.
           | jmull wrote:
           | That $144 would have been for a single two-day test.
           | Anyway, getting caught up in specific remediations that could
           | have prevented this is beside the point. For development you
           | want a _safe_ testing environment because mistakes, gaps,
           | misunderstandings, bugs are a fundamental part of it. The
           | entire point of tests and testing environments is to discover
           | the problems you know exist but need to test to find.
           | walrus01 wrote:
           | Yes I would agree that having automatic-scaling set to
           | effectively infinite by default is not the best choice for
           | the end user who is paying.
           | But for the cloud operator, when somebody's runaway
           | application results in a $15,000 bill that has to be paid,
           | sure...
           | As to whether letting people's runaway things scale up
           | infinitely is an intentional choice, I couldn't say.
             | reddit_clone wrote:
             | For me the needle swings towards Malice, away from Mistake
             | on this one. At the very least callousness
             | Add to the long list of disappointments at humanity:
             | - Late fees were a big part of BlockBuster's business.
             | - Police departments factor traffic fines into their
             | budget.
             | - Thousand other dark patterns that are unethical but not
             | illegal.
         | donmcronald wrote:
         | The crazy part to me is using the cloud for _testing_. It's
         | crazy. I have a 5 year old dual CPU Xeon with 128GB of RAM and
         | a couple NVME disks that I've spent about $1000 CAD total to
         | build ($700 USD). Something in that range on Azure is about $1
         | / hour if you reserve a year. ~$9000 per year.
         | All the people running workloads that don't require the
         | redundancy given, like CI, blow my mind. The costs are
         | astronomical vs buying a cheap or used server. Sure, use the
         | cloud for you production builds, but why not augment it with
         | something that doesn't cost as much?
           | walrus01 wrote:
           | as a totally randomly chosen example, that I spent not more
           | than 20 seconds searching for, here's a system with 128GB of
           | RAM for way under $500.
           | https://www.ebay.com/itm/DELL-R910-16SFF-
           | model-4x-Intel-X755...
           | need to add your own storage (good quality SSDs, of course).
           | and it assumes you have somewhere to put noisy things...
           | I would estimate it's about a 500W electrical load, so figure
           | $40-50 additional electrical bill, if you're trying to
           | precisely account for all costs.
           | you can totally set up a desktop workstation dual xeon for a
           | similar price as well.
       | tunesmith wrote:
       | Deeply dissatisfying to read. Ex-Googler uses connections to get
       | his (understandable!) cloud mistake refunded.
       | Every time I read one of these stories, I get more and more
       | convinced I will just simply never use scalable cloud tech for my
       | side projects. I'm not going to risk my family's retirement
       | savings on the all-too-possible chance that a small deep-
       | implication error will cause runaway charges.
       | daneel_w wrote:
       | It's "fantastic" how Google by the end of the article still come
       | out as a good and friendly bunch...
         | BonoboIO wrote:
         | Ridiculous ... it's designed to charge you, upgrade you and
         | makes u spend as much money as possible. And 24 hours after it
         | happend they show it to you in the dashboard.
         | Great.
         | Just use dedicated servers for the start! It will hurt way less
         | and you can easily upgrade to the cloud later, IF necessary.
         | that_guy_iain wrote:
         | Well, when someone lets you off a 72k bill you generally think
         | nicely of them. But considering Google had no way of collecting
         | that and asking for it would have resulted in loss of other
         | business (they'll keep hosting there and keep paying them) and
         | this isn't like Google lost 72k or like 72k even matters to
         | them so it's just good PR, good business to get money on the
         | backend and faster.
         | I have to wonder if even if they tried to get the money would
         | they legally been able to fight it. From my experience with
         | judges in Europe, they would have most likely looked at the
         | budget being ignore and then upgrading someone from a free to a
         | paid plan without consent and told Google it was their own
         | fault and the services weren't ordered or authorised.
         | spacemanmatt wrote:
         | Author is an ex-Googler so he knew exactly how to speak to the
         | system
           | bjarneh wrote:
           | I'm sure that was an important factor, but wouldn't this in
           | any case just be a bill to a startup with no money on hand?
           | It's hard to make companies (without money) economically
           | responsible for anything I guess, it even seems hard to make
           | companies with money responsible sometimes.
             | donmcronald wrote:
             | I can't set a limit like that on my CC, so the first charge
             | for $5k would have cleared meaning it would have run way
             | longer and racked up way more usage. I'd bet you my
             | computer I would have been out _at least_ the $5k that
             | cleared.
               | bjarneh wrote:
               | You're probably correct. It really makes you think twice
               | about setting up some of those cloud services without a
               | hard limit cutoff.
               | Symbiote wrote:
               | I normally receive only my "part" of my corporate credit
               | card statement, but earlier this year I was sent more of
               | it.
               | That's when I found out the card has a credit limit of
               | over EUR50,000.
               | Reading this, I wonder if we should contact the bank and
               | ask for another card with a lower limit, to use with
               | various cloud services. We are 99% on-premises, but have
               | about EUR200/month in various GCS/AWS usage.
       | kerng wrote:
       | This is really scary. It's so unpredictable what one actually has
       | to pay, especially for a small business moving to the cloud is
       | much more challenging then it should be.
       | When creating resources it's really unclear what one might be
       | charged, then there are saving plans and pre-commitment options
       | and so forth.
       | Might be a good startup idea, basically just sell cloud resources
       | via a simple, predictable payment model.
       | salmonlogs wrote:
       | As an ex-Googler working in a customer facing role in Cloud you
       | did very well to get a $72k bill written off! It's definitely
       | possible but requires a lot of approvals and pulling in a few
       | favours. I went through the process to write off a ~$50k bill for
       | one of my customers and it required action every day for 3 months
       | of my life.
       | Whoever helped you inside Google will have gone to a LOT of
       | trouble, opened a bunch of tickets and attended many, many
       | meetings to make this happen.
         | cogman10 wrote:
         | I know there's no reason for Google or AWS to do this, but man
         | do I wish there was a way to put down a spending limit and
         | simply disable anything that goes over that limit.
         | It's a little bit nuts that there are no guardrails to prevent
         | you from incurring such huge bills (especially as a solo
         | developer that might just be trying out their services).
           | brianwawok wrote:
           | There are guard rails in quotas. Like you can only spin up X
           | servers without opening a ticket to ask for more.
           | Now, think some of these quotas can still lead to some pretty
           | crazy bills.. but that is the point of at least some of
           | them....
           | fweespeech wrote:
           | Tbh, its lack is why I don't use Google or AWS for projects.
       | marcell wrote:
       | If this happens you can usually reach out to Google to see if
       | they will refund the charge. They don't really benefit from
       | making $72k of a solo developer's buggy code. I've done it once
       | and their team was very helpful and reversed the charge.
       | tony wrote:
       | Google eventually forgave the bill in Part 2:
       | https://blog.tomilkieway.com/72k-2/
       | > Google let go of our bill as a one time gesture!
       | Thank goodness.
       | And it looks like it had to do with not understanding the API /
       | system on the first order, IMO.
       | This hit me hard a few months ago with CloudFront invalidations
       | on AWS. I check billing and the things at 30usd in a single day,
       | from a norm of <1usd per month, so it's showing a 13,000%
       | increase (this is for documentation of open source projects). I'm
       | writing their support and at the mercy so to speak - technically
       | I ran up the bill. I ended up paying up, but I secretly hoped I'd
       | get some AWS credits for the projects, heh
       | Aside: Amazon has some nice features for rule-based alarms on
       | accounts so when you spend more than X dollars, you get an email.
       | ljm wrote:
       | I think I'll treat this as the latest in a several line of
       | warnings about not going all-in on all these Cloud services until
       | you seriously know what you're doing.
       | So much of it is so unnecessary to begin with. You can do so much
       | with a cheap VCS or two without thinking about lambdas or cloud
       | functions or kubernetes or who knows what. But these days you'd
       | be forgiven for thinking it's dark magic.
       | You're not going to run up a 5 digit bill in a day by starting up
       | on a few $10 VPSs. And you'll probably have an architecture that
       | fits in your head to boot.
       | Also: The article title should really be "Saved 72k and avoided
       | bankruptcy by being an ex-Googler."
         | tunesmith wrote:
         | Just don't go all-in on them at all unless you're spending
         | someone else's money.
       | tedunangst wrote:
       | What about the bill for the sites this request cannon was pointed
       | at?
       | awinter-py wrote:
       | fwiw I've had cloud vendors be relatively willing to forgive
       | bills when something went wrong with SLAs, bad bugs, or their
       | internal dashes misrepresented usage.
       | they know their systems aren't perfect, and if you velvet hammer
       | them long enough, they'll do the right thing.
       | Twirrim wrote:
       | Disclaimer I work for another cloud (not AWS), opinions are
       | entirely my own. I try to avoid posting in a negative fashion
       | about clouds, but holy crap this blog post...
       | AWS has this principle of Customer Obsession that enters in to
       | lots of discussions, design decisions etc. "What is the customer
       | experience of $foo?". Along with asking the positive, you ask the
       | negative too, and explore the customer impact of shit going
       | wrong. What does the worst experience look like, what is the
       | impact for the customer, how might you mitigate that or make it
       | so you can at least make it up to customers quickly, if you
       | really can't avoid it.
       | I find it hard to fathom Sudeep's attitude here. So much of this
       | article is ringing large alarm bells. These are not the things
       | I'd want to see from a cloud provider as a customer.
       | Is this Stockholm Syndrome? Too much drinking of Kool-Aid as an
       | ex-googler? Unfamiliarity with how other cloud providers operate?
       | (from part 1) > Automatic Upgrade of Firebase Account to Paid
       | Account
       | This is what I mean when I say look at negative vs positive use
       | cases. I'm guessing some combination of customers having a lousy
       | experience running in to Free Tier limits, and staff spending too
       | long having to bump up accounts. So they implemented an automatic
       | upgrade (What, then, is the point of a free tier? No room to
       | experiment, no room to try it and see)
       | This is precisely the sort of thing that customer obsession
       | principle is supposed to aid in. Automated upgrade certainly
       | solves the staffing time spent bumping up accounts, and it helps
       | customers that used to have to request limits being increased,
       | but it massively fails in the negative customer experience side
       | of the equation here. Someone, somewhere, should have asked the
       | question "What if the customer has made a mistake".
       | Instead, make it easy and quick for anyone to click a button and
       | get their account changed from Free to Paid, without staff
       | engagement. Give customers easy agency to control their
       | experience.
       | > Billing "Limits" don't exist. Budgets are at least a day late.
       | That's insane. Clouds are about speed and dynamic scalability.
       | Mistakes can ramp up the bill an crazy amount in a short period
       | of time, as Sudeep found out.
       | How is a 24 hour delay in billing sync and budget warnings even
       | remotely acceptable to them / Sudeep / customers?
       | Sure it's probably fine for the 90% cases, but that's crippling
       | for the 10% and even if you decide you really don't give a crap
       | about your customers, you don't want the bad press that 10% will
       | likely give you.
       | Picture what financial damage someone might do if they
       | compromised some of your credentials somehow? You screwed up,
       | credentials got leaked, and you won't necessarily know for a
       | _day_ that something has gone wrong, nor will your restrictions
       | kick in?!
       | Billing is the single highest TPS service in any cloud, with
       | Identity often a close second (billing gets requests for every
       | transaction, _and_ internal requests related to ongoing charges).
       | You need to handle a high rate of requests, with low latency both
       | in request /response and processing data received. It's a hard
       | engineering problem, and cloud platforms try to get some of the
       | smartest engineers working on it. An organisation of Google's
       | caliber has more than enough smart engineers to be working on
       | these kinds of hard problems, even by temporary secondment.
       | Quota / Limits in a fast changing cloud environment need to be
       | dynamic and responsive.
       | >I knew how to put the case for Google team when they would come
       | back to work in 2 days.
       | How is 2 days even remotely acceptable? Maybe it's just how it's
       | written, but it reads like this is just accepted as the way
       | things are. Why would you even have to carefully work to present
       | your case?
       | Where are the 24x7 response people with the ability to forgive
       | bills? $72k is chump change for a cloud provider, and especially
       | for a company of the scale of Google. Give your support agents
       | the tools and authority they need to make reasonable decisions,
       | with some appropriate kind of oversight process, and stick in
       | feedback mechanisms so product managers know what problems
       | customers are having.
       | It's not like that would actually have cost them $72k in direct
       | running costs either. That _should_ have been a near instant no-
       | brainer. Forgive, move on, and reap the benefits of good customer
       | good will. That good will will earn you way more profit than
       | forgiving it would have cost. You 're investing in their
       | continuing business. Sometimes those investments will fail, but
       | most of the time they'll succeed.
       | >In our case, it differed by 86,585,365.85 %, or 86 million
       | percentage points. Even when the bill was notified to us,
       | Firebase Console dashboard still said 42,000 read+writes for the
       | month (below the daily limit).
       | So it's just fake observability? What's the point? 24 hours delay
       | here is nuts, almost to the point of being useless. It can be
       | hard to calculate these figures out yourselves. A fast feedback
       | cycle is critical. As Sudeep here found out, 24 hours is a great
       | way to have zero clue what's going on until it's too late. Is
       | there really no other way to get this information more up-to-
       | date?
       | Moving on to part 2: >I had a team of ~7 engineers/interns at
       | this time, and it would take Google about 10 days to get back to
       | us on this incident.
       | Why is a 10 day response time from Google considered even
       | remotely acceptable for a cloud provider? Your entire platform is
       | down, you're working out ways around this situation, stressing
       | about potential bankruptcy, and it's just cool with you that it
       | took 10 days for them to make a business/life changing decision
       | over what amounts to chump change?
       | These kinds of mistakes happen with clouds, AWS is famous for
       | waving these shock bills from mistakes and it never takes 10 days
       | to get it done.
       | Billing should be the easiest and most obvious thing. If your
       | cloud provider is creating complicated billing structures, that's
       | a problem the cloud provider should be solving, not expecting
       | customers to unravel the mysteries.
       | Companies being spun up to help people navigate your billing
       | should be an alarm call, not something to celebrate or for
       | customers to consider normal.
       | > Fail fast, learn fast with Cloud is a bad idea
       | It shouldn't be. With near immediate feedback you'd have known
       | straight away that shit was bad, and cut the experiment out
       | before it cost you an arm and a leg.
       | > While creating a Cloud Run service, we chose default values in
       | the service. The max-instances is preset to 1000, and concurrency
       | set to 80 ...... Same goes with Cloud Run! With Concurrency ==
       | 60, max_containers == 1000 and each Request taking 400ms, number
       | of requests Cloud Run can handle 9 million requests per minute!
       | Why are the default values that high on a service? That seems
       | like you're asking customers to shoot themselves in the foot.
       | Where was the look at the negative customer experience side of
       | the equation? Make it easy for customers to do the right thing.
       | Then the bit that really bugs the crap out of me: > Thank you
       | Google!
       | He's thanking Google for having had an absolutely shitty
       | experience on their platform: 10 days of stress from needless
       | delays in forgiving a trivially small bill, dealings with
       | multiple lawyers, investigating bankruptcy, risk of missing
       | product launch date, working around the clock to dig themselves
       | out of hell...
       | thrower123 wrote:
       | I'm not sure I want to know how much Azure and AWS revenue comes
       | from people spinning up test VMs or a kubernetes cluster to work
       | through a training, and then forgetting to turn it off.
       | I've spent thousands extra this year because people stood up 4 MB
       | SQL databases and let them default to charging by vCores instead
       | of DTUs.
         | beoberha wrote:
         | Much less than the amount from deals with strategic partners.
         | The long tail of $5 a month from forgotten VMs is likely orders
         | of magnitude less than the handshake deals you can publically
         | read about.
       | seanwilson wrote:
       | https://blog.tomilkieway.com/72k-2/
       | > To overcome the timeout limitation, I suggested using POST
       | requests (with URL as data) to send jobs to an instance, and use
       | multiple instances in parallel instead of using one instance
       | serially. Because each instance in Cloud Run would only be
       | scraping one page, it would never time out, process all pages in
       | parallel (scale), and also be highly optimized because Cloud Run
       | usage is accurate to milliseconds.
       | > If you look closely, the flow is missing few important pieces.
       | > Exponential Recursion without Break: The instances wouldn't
       | know when to break, as there was no break statement.
       | > The POST requests could be of the same URLs. If there's a back
       | link to the previous page, the Cloud Run service will be stuck in
       | infinite recursion, but what's worst is, that this recursion is
       | multiplying exponentially (our max instances were set to 1000!)
       | Did you not consider how to stop this blowing up before
       | implementing? Having one cloud function trigger another like this
       | with no way to control how many functions are running at the same
       | time with no simple and quickly met termination condition (with
       | uncapped billing) is playing with fire. It's not going to be
       | optimal either if most of the time each function is waiting for
       | the URL data to download.
       | You need to be using something like a work queue, or just keep
       | life simple and keep it on a single server if you can.
       | greatgib wrote:
       | At the end of page 2, there is a good ass licking bullshit
       | sentence:
       | << It's also a great company to collaborate with. The tools
       | provided by Google are very developer friendly, have a great
       | documentation (for the most part), and are consistently
       | expanding.>>
       | He said that as an ex googler and as the beneficiary of a
       | gesture, but this contradict the full history he told us. If the
       | doc and tools were so great, why he felt into this situation?
         | sudcha wrote:
         | OP here.
         | Maybe it comes across like that, but as somebody building a
         | product with very limited resources, Google's documentation is
         | one of the best so far.
         | The situation was our fault too. I just went with a test and
         | fail fast attitude, just like with every things we do a dev
         | environment.
       | delduca wrote:
       | happened to me something very similar, I was using cloud run to
       | fetch some subreddit posts and ended with a "recursive" way,
       | because of that, billions of invocations was made... luckly, I
       | was at the front of the computer and stopped before, but the bill
       | was around $4000! I contacted the google support and explained
       | all, they "forgot" my debit because of the bug
         | qayxc wrote:
         | So basically no one's testing their code anymore and just
         | throws it into a paid service?
         | Great that Google was so lenient and all, but I really don't
         | get the appeal of using a hyper-scaler when a VM with docker
         | support can be setup in literally seconds and on-demand pricing
         | of less than 10 cents/hour for most quick-and-dirty tasks.
         | Am I missing something here?
       | pettycashstash2 wrote:
       | Thank you for sharing. I was actually thinking of using fire base
       | for my project. They make us so easy to sign up for free tier.
       | Awaiting to see what happens in part 2
         | bharatsb wrote:
         | Part 2: https://blog.tomilkieway.com/72k-2/
         | spacemanmatt wrote:
         | I recently put the (soft) kibosh on a project in my stable
         | trying to switch to FireBase at the last minute.
         | It looks attractive but the business aspects are frankly
         | frightening, and I'm not even talking about the risk of a large
         | bill. Getting your metrics 24h late sounds like a deal killer
         | for me. So much for observability!
           | asciimike wrote:
           | Minor nit: many non-billing metrics are near-real time, e.g.
           | DB concurrents, cloud functions CPU/RAM usage; any metrics
           | that require aggregation (storage, billing) are going to be
           | batched less frequently. This is going to be true across all
           | platforms of non-trivial scale (eventual consistency + batch
           | jobs).
           | Second note: the number of people who actually do this is
           | very low (a few a year, of hundreds of thousands of
           | developers). The blog posts are scary, but in my ~five years
           | at Firebase, I'm pretty sure we refunded every one. As my
           | boss (James Tamplin, CEO of Firebase) used to say, "There are
           | lots of bad systems, but rarely are there bad people."
       | dilatedmind wrote:
       | interesting, how did the spend breakdown between cloud run and
       | firebase?
       | did you have any limit to how many req/s you made to an
       | individual site? It seems this would be difficult to implement
       | with this architecture.
       | how did you deal with following links in circles/ avoiding
       | scraping the same page multiple times?
       | I had built something similiar at a previous job, recursively
       | scraping ecommerce sites. The first thing I noticed was some of
       | the sites we were scaping couldn't handle more than a couple
       | requests a second (in particular as we scaped uncached pages by
       | sites running php). Other sites were quick to ip ban.
       | I kept things simple, a few dozen micro instances on aws (think
       | they were like $3 a day) running puppeteer. A single server
       | acting as a controller, keeping a per site queue and allowing us
       | to set per site request limits if necessary. All the state of
       | which links were already seen just kept in memory. Of course
       | everything was also persisted to a db, and if the controller
       | process needed to be restarted, it could restore the queue/ seen
       | state and resume.
       | akh wrote:
       | > Had we chosen max-instances to be "2", our costs would've been
       | 500 times less. $72,000 bill would've been: $144. Had we chosen
       | concurrency of "1" request, we probably wouldn't have even
       | noticed the bill.
       | > If you count the number of pages in GCP documentation, it's
       | probably more than pages in few novels. Understanding Pricing,
       | Usage, is not only time consuming, but requires a deep
       | understanding of how Cloud services work. No wonder there are
       | full time jobs for just this purpose!
       | Great write-up - thanks for sharing @bharatsb! As you say, cloud
       | pricing has become too complex for developers to understand
       | quickly (they want to ship features, not calculate costs). Infra-
       | as-code is great, but it has made it even harder to understand
       | which code/config option costs what. `terraform apply` is like a
       | checkout screen without prices.
       | We're trying to solve this problem with infracost.io, initially
       | looking at Terraform. It would be interesting to get your
       | feedback on whether such an approach might have helped you?
       | Probably not as it doesn't look like you were using Terraform?
       | lewich wrote:
       | So when in IT industry there will be responsibility for what we
       | engineer?
       | pwinnski wrote:
       | This is most developers' worst nightmare when it comes to a
       | completely new environment generally, and Cloud solutions
       | specifically.
       | It's easy and pointless to say they should have done things
       | differently. Worse than pointless. Obviously they should have,
       | and kudos to them about being open about the compounded mistakes.
       | Still, this strikes at the fears that lie in the heart of any
       | reasonable, honest developer doing something completely new.
       | New developers should be cautious about cloud platforms, but they
       | were! Not cautious enough, obviously, but they did set limits
       | they thought would be honored.
       | Platforms should have hard monetary limits at the account level,
       | clearly, as well as an option to turn them off. Shame on all of
       | them which don't.
       | FastQ wrote:
       | I'm just a student but I've spent about 10 hours trying to figure
       | out why Azure has been charging me >$5/day for their "basic"
       | database @5DTUs, 2gb max storage. This morning I was so
       | exasperated I sent a letter threatening to report them for fraud
       | if nobody could tell me why I was being charged 30x the listed
       | rate, which so far no one has. This is an extremely cathartic
       | post to see that I'm not alone, thanks for sharing.
         | [deleted]
         | marktolson wrote:
         | Go to billing > cost analysis > filter by resource break down.
         | Azure billing analysis is pretty amazing.
           | FastQ wrote:
           | Yeah, but it just shows my database cost which is higher than
           | is listed as far as I can tell.
             | luser007 wrote:
             | Could it be listed "hourly" and you're charged "daily"? Add
             | in VAT (equal to 25% in some countries) and you match the
             | 30 times higher than expected charge.
               | ylere wrote:
               | https://azure.microsoft.com/en-us/pricing/details/sql-
               | databa...
               | Basic tier, 5 DTUs, 2 GB is listed as ~$4.8971/month or
               | $0.0068/hour on this page. Extra storage would cost more
               | but is not available for the basic tier.
       | sudcha wrote:
       | OP here. Just found out that the post made it to HN. Thanks for
       | sharing and I'll be replying to some comments.
       | altdatathrow wrote:
       | Do not use hosted cloud services where the implementation creates
       | publicly accessible API keys and each HTTP request results in a
       | charge to your account. A few specific examples are Firebase,
       | Algolia, and AWS Lambda.
       | All it takes is one programming mistake or one bad actor and you
       | can find yourself in an equally precarious situation.
       | LordHeini wrote:
       | That sort of crap is the reason we host all our stuff on root
       | servers.
       | Even trying to read the amazon pricing for their instances, hours
       | and what not, drives me insane.
       | Seems this is done on purpose. no wonder they make so much money
       | with it.
       | So i have never seen a reason to move any stuff to the cloud.
       | Just grab a dedicated server for a few bucks and put a bunch of
       | docker containers on those.
       | Its way cheaper, usually not more complicated. Just use a CI with
       | Gitlab runners or whatever and be done with it.
       | Most apps don't need scaling anyway and if you do, just put that
       | app on bare metal fitting your requirements.
         | that_guy_iain wrote:
         | > That sort of crap is the reason we host all our stuff on root
         | servers.
         | Having just started my own journey into building products for
         | myself, pretty much the first thing I realised with my tech was
         | I need to get dedicated servers instead of cloud just because
         | it costs 100x less.
         | > Just grab a dedicated server for a few bucks and put a bunch
         | of docker containers on those.
         | Exactly, if you really want kubernetes coolness to act cloud
         | like, install kubernetes it's free and is super easy to setup.
         | And with the cost savings you can literally buy multiple spare
         | servers and with kubernetes using them all while keeping the
         | usage low allowing to scale up new nodes if needed.
           | throwaway201103 wrote:
           | > kubernetes ... is super easy to setup
           | Can you point me to the super easy setup guide? Because I've
           | tried a few and never gotten it working.
             | WrtCdEvrydy wrote:
             | I don't like use kubernetes raw but I am a fan of Caprover
             | (which has kubernetes support)
         | scrollaway wrote:
         | AWS pricing is not obscure, it's just not for you. So in that
         | sense, you are correct to not see a reason to move to the
         | cloud, but your advice does not apply to everyone.
         | And I don't believe they make "more money" that way at all. AWS
         | margins are either very low or very high, and the higher
         | margins and prices tend to be the "simpler" ones: packaged,
         | managed products such as Redshift that are billed on fewer
         | tiers and flatter prices.
         | When you design your application with AWS, pricing has to enter
         | your design considerations. For example if you are designing
         | something that will interact a lot with S3 you want to minimize
         | PUTs. You want to minimize ram usage on lambda by streaming
         | rather than buffering. Etc.
         | AWS is not a suitable product for playground stuff. The only
         | reason it gets used as such is because it's easier if you're
         | already using AWS for other things (or it's you're already very
         | familiar with it).
           | nojito wrote:
           | AWS's margin is currently 30+% which is massive.
           | >AWS pricing is not obscure
           | There is a massive secondary consulting market because of
           | AWS's price obscurities.
             | dragonwriter wrote:
             | > There is a massive secondary consulting market because of
             | AWS's price obscurities.
             | There is a massive secondary consulting market because the
             | enterprise market is addicted to secondary consulting. This
             | secondary consulting market _includes_ AWS pricing because
             | it includes pretty much any IT service the target market
             | might be interested in.
             | A rational need for decomplexification _isn't_ necessary to
             | explain the existence or coverage of enterprise secondary
             | consulting, IT or otherwise.
             | klohto wrote:
             | > There is a massive secondary consulting market because of
             | AWS's price obscurities.
             | While that's true, there is consulting market for most
             | things that are complicated. Doesn't mean they are shady.
             | It's simply not for you. You are welcome either to dive in
             | or get a consultant. I promise you though, that AWS pricing
             | isn't difficult once you understand few concepts and know
             | your way around the Cost Explorer. With proper tagging,
             | it's easy to drill down which resource is consuming how
             | much. I don't believe there is a way to have a simple
             | billing for a complicated product(s).
               | Closi wrote:
               | > While that's true, there is consulting market for most
               | things that are complicated. Doesn't mean they are shady.
               | It does mean it's not simple though.
               | scrollaway wrote:
               | Obscure and complex are different concepts. I'm part of
               | that "secondary consulting market" FWIW, so I'd like to
               | think I know a thing or two about it.
               | Does AWS have high-margin prices? In aggregate, somewhat,
               | but this is mostly driven by the big ticket managed
               | enterprise items: Aurora, Redshift, Quicksight, probably
               | Fargate, etc. A lot of their more popular stuff (S3,
               | Lambda, ...) offer incredible value for very little
               | money. EC2 is the exception I believe, because I
               | understand it to be high margin for how popular it is.
               | But EC2 pricing is one of their simplest ones.
               | Could AWS simplify some of their pricing? Yes, probably.
               | There's always room for optimization. Personally for
               | example I'd like to see their pricing be global rather
               | than different by region (with understandable exceptions
               | for govcloud and china).
               | Is AWS making its pricing complicated for nefarious
               | purposes? No, there is no evidence to support that.
               | AWS pricing absolutely is not simple. It's a part of the
               | AWS stack. You need to study AWS's events/signals system
               | to be able to write apps that make the best use of AWS's
               | interconnected stack. You need to study their APIs / SDKs
               | to really understand what you're able to implement. And
               | you need to study their billing systems to understand how
               | to implement apps that run cheaply, and be able to
               | predict potential runaway costs.
               | It has to be a part of the design. That's why you may
               | want to hire consultants for it: People who understand it
               | better than you do, and will be able to assist you in
               | reducing your costs.
               | It's just another kind of optimization. Maybe some
               | software engineers don't like it because it hits them
               | where it hurts (the wallet) when they don't do it right,
               | rather than be able to brush it off as they usually do.
               | It's much easier to ignore the waste produced by, say for
               | example, the 3000 javascript dependencies shipped with
               | the fat, unoptimized electron app they ship on their
               | users' desktops, that do a ton of unnecessary expensive
               | computing; when all that crap is client-side and it's the
               | _downstream user_ 's electricity bills and CPU time
               | that's being used.
               | [deleted]
             | scrollaway wrote:
             | The margin is absolutely not the same across all products.
             | > _There is a massive secondary consulting market because
             | of AWS 's price obscurities._
             | Its. Not. For. You.
             | AWS pricing is a part of your design. With some exceptions
             | (that you aren't talking about), they charge you more for
             | using more resources. You are forced to design systems that
             | use less resources if you want to optimize your bill.
             | That consulting market is an optimization market. It's
             | economics at its best.
             | If you are too small to have to take these things into
             | account regardless, AWS is not for you. You're welcome to
             | use it, but don't be surprised if you end up having to deal
             | with these kinds of things which simply don't exist in the
             | world of flat-price underprovisioned droplets.
               | [deleted]
               | nojito wrote:
               | >AWS pricing is a part of your design. With some
               | exceptions (that you aren't talking about), they charge
               | you more for using more resources. You are forced to
               | design systems that use less resources if you want to
               | optimize your bill.
               | This is marketing.
               | It's like saying you want to build a house and the quote
               | you got ends up blowing up 100x overnight.
               | Great example is the 100k credit for startups. You can
               | repeat it's not for you all you want, but their business
               | is predicated on pricing ignorance and vendor lockin.
               | scrollaway wrote:
               | The $100K credit (which I've been granted multiple times)
               | is there because if Amazon can get you to invest serious
               | work into their infra, they'll make up for it in the long
               | run. It's not "lock in", it's sales. The only amazon
               | "lock in" really is their bandwidth-out pricing, which is
               | a sleazy tactic for sure but I'm not hesitant to call it
               | out _when it 's the case_.
               | You can get the $100/$300/$1000 tier if you are in "just
               | checking it out" solo mode. $5k and up requires either
               | connections, partnerships, or a serious application.
               | Anyway I don't know what your point is, I'm not even sure
               | if you have one. They're not "marketing" their pricing,
               | nor the fact that you are "forced to design systems that
               | use less resources".
               | csharptwdec19 wrote:
               | > Anyway I don't know what your point is, I'm not even
               | sure if you have one. They're not "marketing" their
               | pricing, nor the fact that you are "forced to design
               | systems that use less resources".
               | I think they are referring to this statement:
               | > > AWS pricing is a part of your design. With some
               | exceptions (that you aren't talking about), they charge
               | you more for using more resources. You are forced to
               | design systems that use less resources if you want to
               | optimize your bill.
               | It is a defense that I've heard in many AWS talks in the
               | past.
               | Where it turns into a 'marketing' blurb to me is my real
               | world experience in these AWS talks in the places I work.
               | As a real world example, we had a product that required
               | -some- architectural work, but otherwise was solid, and
               | could run on 3 live EC2 instances (2 web LB, 1 live
               | backend) and 1 spare (spare backend)
               | The Consultant that AWS partnered us with? Suggested a
               | very overdone architectural revamp, moving everything
               | possible into AWS Specific technologies.
               | It's marketing in that in many of our experiences, we
               | know there is often at least one person on a team who
               | does -not- have the discipline and/or experience to
               | -keep- a system using less resources as the field goes
               | from green to brown.
               | scrollaway wrote:
               | Overengineering is easy and happens not just with AWS but
               | with just about anything in software engineering.
               | I'm having trouble seeing how this changes what I'm
               | saying: That with the way AWS pricing is structured, you
               | are supposed to take it into account when designing your
               | product.
               | When you reach a certain size / complexity and you have
               | to design infrastructure, you _should_ be making
               | schematics, predictions on the usage peaks and troughs,
               | how various parts of the infra will be affected, how
               | active /idle they will be.
               | When you are dealing with AWS, pricing becomes extremely
               | predictable because it can be derived from those plans.
               | And it is far better to be dealing with that kind of
               | model than to deal with "unlimited with a million
               | asterisks" or something. AWS is predictable, reliable,
               | and most notoriously has never ever increased their
               | prices, so whatever you calculated will not go up because
               | of Amazon's decisions.
               | WrtCdEvrydy wrote:
               | > Suggested a very overdone architectural revamp, moving
               | everything possible into AWS Specific technologies.
               | To be honest, depending on the technology, the savings
               | could be worth it... for example, did you know you get a
               | discount if your traffic is served over cloudfrount? even
               | if your distribution is set to no cache any resource, you
               | can front your APIs using cloudfront and save networking.
           | akh wrote:
           | How do you take pricing into your design considerations? Does
           | it come with experience from using an AWS service in
           | production and understanding how it's priced, combined with
           | the usage numbers the new system might get? I'm trying to
           | learn more about how engineers currently do this.
             | thanksforthe42 wrote:
             | Calling programmers "Engineers" is a misnomer.
             | I wish programmers had the prestige it deserved for
             | combining Science, tradition, authority, and art.
             | Engineers are not allowed to use tradition, authority or
             | art. They are restricted to being modern day calculators.
             | Nothing is wrong with either.
               | csharptwdec19 wrote:
               | The shift from 'Developer/Programmer' to engineer has
               | indeed been part of a push away from creativity towards
               | cookie-cutter work.
               | An interesting analogue would be the Automotive industry;
               | As time progressed, Companies focused more and more on
               | 'engineering' versus art/tradition/etc. But as the
               | industry evolved, "Flashy" vehicles that took risks
               | became moreso either a halo product for a brand, or
               | relegated to Luxury/Boutique.
               | And, of course, there was the dark side of this shift; A
               | good example from the 70s, the level of 'engineering'
               | driving the design of the vehicle and it's assembly
               | didn't take into consideration the actual line worker; in
               | Ohio the workers wound up getting overworked, burned out,
               | and in some cases actively sabotaged the product, because
               | they were being treated like automated machines.
               | thanksforthe42 wrote:
               | I think that missed the point.
               | Engineers are applied scientists.
               | Programmers are not applied scientists.
               | scrollaway wrote:
               | Why does any of this matter?
             | scrollaway wrote:
             | Basically, yes.
             | It's not that complicated, it's just not something
             | engineers are usually used to do. If you use an AWS
             | service, you look at its pricing.
             | Take s3 for example: whenever you use it, you'll pay for
             | outgoing bandwidth, PUTs, GETs, and storage.
             | So you seek to minimize all of these:
             | 1. Bandwidth: use cache layers. This also minimizes GETs.
             | 2. PUTs: design your app in a way that doesn't do
             | unnecessary inserts into s3. Consider alternatives such as
             | redis, postgres or filesystem depending on the need.
             | 3. Storage: compress your objects if they compress well. If
             | they aren't often accessed, use storage classes and auto
             | lifecycle management.
             | Pricing in AWS generally reflects some kind of engineering
             | limitations you will face at scale in the first place, so
             | it makes sense to go through this whole exercise either
             | way.
         | croh wrote:
         | > Most apps don't need scaling anyway and if you do, just put
         | that app on bare metal fitting your requirements.
         | Most important !
         | vmception wrote:
         | > Most apps don't need scaling anyway and if you do
         | Man, exactly right. Many of guys here would love crypto once
         | you stop asking why and start asking how.
         | The most lucrative projects these days are completely frontend
         | UI, they don't even their own backends they as just read state
         | from the nearest node when the client connects their wallet.
         | Some people forgot that the scalability game was to convert
         | traffic into money. So ditch that, and remember you are in the
         | money game.
         | Aperocky wrote:
         | > Most apps don't need scaling anyway
         | This is exactly right. I host stuff in buckets/cloudfront and
         | uses a bit of lambda/route53. I end up paying $4 a month.
         | now that will be very different if 10 million people suddenly
         | decide to visit my site, but if that happens money probably
         | won't be a problem after all.
       | onion2k wrote:
       | The fact that cloud providers don't have a simple "This is how
       | much I can afford, don't ever bill me more than that!" box on
       | their platforms makes development a lot scarier than it really
       | needs to be.
         | sandGorgon wrote:
         | Google does have this feature
         | https://cloud.google.com/billing/docs/how-to/budgets-program...
         | Here's the specific example
         | https://cloud.google.com/billing/docs/how-to/notify#cap_disa...
           | ggthrowaway2020 wrote:
           | As a former victim to the same issue as OP, I am furious
           | every time I see a Googler promote that as a solution.
           | In our case, we racked up a $10000 bill on BigQuery in ~6
           | hours, when a job was failing and auto-retrying.
           | We had set up _every_ alert correctly and our reaction time
           | was about 5 minutes (about $100 of usage, no big deal). So
           | how did we get a $5000 bill? _Google 's_ alert was 6 hours
           | late (according to them, this was root-caused to us, because
           | we were submitting jobs continuously). They pointed to their
           | TOS and said they don't guarantee on-time delivery of the
           | alert.
           | I had to write up a blog post with fancy graphs and prepare
           | it for social media before they finally agreed to eat the
           | bill.
             | FeistySkink wrote:
             | Is there a public postmortem anywhere? Your message points
             | to 'no', but just in case.
           | mrtksn wrote:
           | OP claims that the budgets are not real time, they are
           | eventually accurate but if it happens that you spend too fast
           | you may end up with a larger than your budget sum before
           | anything triggers.
           | lights0123 wrote:
           | > There is a delay of up to a few days between incurring
           | costs and receiving budget notifications. Due to usage
           | latency from the time that a resource is used to the time
           | that the activity is billed, you might incur additional costs
           | for usage that hasn't arrived at the time that all services
           | are stopped. Following the steps in this capping example is
           | not a guarantee that you will not spend more than your
           | budget.
           | This looks like it has the same problems as the post, because
           | it also relies on those budget alerts that can happen a long
           | while after you've exceeded them.
           | [deleted]
           | modeless wrote:
           | "Following the steps in this capping example is not a
           | guarantee that you will not spend more than your budget."
           | "Resources [...] might be irretrievably deleted."
           | Also it's not automatic, you have to manually write code to
           | do it, and test it, and make sure not to break it.
           | A reasonable implementation of this feature would be built
           | into the console, guarantee a maximum spend, not require
           | writing your own fallible code, and provide an option to
           | preserve storage (at normal cost) so that all your data isn't
           | deleted when your compute/API stuff is shut down.
           | asciimike wrote:
           | Extremely technically, the only GCP product that had this
           | feature was App Engine Standard v1, but looks like it's
           | deprecated as of the end of 2019
           | (https://cloud.google.com/appengine/docs/managing-
           | costs#chang...)
             | NegativeLatency wrote:
             | Probably hurt revenue ;)
               | asciimike wrote:
               | As a former App Engine PM who spent a lot of time with
               | billing/quotas (though, not the one who deprecated this
               | feature), it's likely due to some combination of:
               | - hard limits caused downtime more often than they
               | prevent these blog posts
               | - hard limits were inconsistently enforced, even within
               | GAE
               | - platform wide quota notifications were implemented
               | (reached "GA"), leaving the question of "how a developer
               | wants to handle this" to the developer, not the platform
               | - maintenance burden
               | The "I bankrupted my startup by running tests in an
               | infinite loop" blog posts happen ~once a year, while the
               | number of customers (including internal teams!) who
               | inadvertently went down because of this quota was
               | staggering. I feel like I used to see one a week, at
               | least. Most often someone on the team was like "oh I'm
               | going to turn this down to zero because we don't want to
               | spend any money during development", never told anyone,
               | and then they go live and they forgot to turn the knob
               | back up (or didn't properly estimate traffic/costs and
               | set it too low).
               | I can tell you it hurts revenue _a lot_ more when a large
               | customer goes down for 15 minutes due to quota issues and
               | their usage drops to zero (both in terms of revenue and
               | customer credibility) vs when tiny developer accidentally
               | blows through 10k in a month and we refund it (since,
               | obviously, the providers cost is a lot less than that).
         | gonzo41 wrote:
         | I wouldn't be too scared. For AWS you get about $0.20 per 1
         | million requests on Lambda. You can do quite a lot with a
         | single Lambda function. And a million of anything is a lot for
         | a dev. Put a HTTP API Gateway infront of that with a CDN and
         | you're hitting ~ a few dollars.
         | If you don't buy one coffee, or put a 20 dollar note in a book
         | one month. Then you're fine. And if you have to use EC2, just
         | use a t2.micro or a raspberry pi on your desk.
         | But really the first lesson you should learn in any cloud setup
         | is Billing Alarms :)
         | If you're doing ML or CV work then it's probably cheaper to
         | build on the desktop and port to cloud once you understand what
         | the workloads are.
           | onion2k wrote:
           | _For AWS you get about $0.20 per 1 million requests on
           | Lambda._
           | If you get it right, great. If you get it wrong then you end
           | up doing billions of operations by mistake, which could cost
           | a _huge_ amount. That 's what happened to the author of the
           | article.
           |  _But really the first lesson you should learn in any cloud
           | setup is Billing Alarms_
           | Alarms only tell you that something is going wrong. They
           | don't stop it. If your mistake is costing $1000/minute and
           | you're an hour away from a computer you have a very expensive
           | problem.
             | jjk166 wrote:
             | So you're taking code that you haven't validated locally to
             | see what resources it uses, you're putting this up on the
             | cloud to test it, then you are immediately going to the
             | middle of nowhere without your laptop/phone/etc, and you
             | can't arrange for a coworker or friend to pull the plug for
             | you if something goes wrong?
               | patrickaljord wrote:
               | > and you can't arrange for a coworker or friend to pull
               | the plug for you if something goes wrong?
               | This is HN, many of us are solo founders with no
               | coworkers or employees. Also how could a "friend" pull
               | the plug? If it was a physical server running in your
               | house maybe, otherwise you can't really give them access
               | to your AWS account with all your private clients data in
               | there.
               | jjk166 wrote:
               | If you don't have anybody who can monitor your test, and
               | you're not monitoring your test, why are you doing a
               | test?
               | As for having a non-employee pull the plug, set up an IAM
               | user with permission to access the test instance
               | WrtCdEvrydy wrote:
               | > you're not monitoring your test, why are you doing a
               | test?
               | Agile. Bringing you bankruptcy at the speed of cloud.
               | jschwartzi wrote:
               | If I'm the only developer on a project and I really need
               | to get to market I might do just that. I sometimes do day
               | hikes on weeknights so this is actually a likely scenario
               | for me.
               | jjk166 wrote:
               | Do you go hiking alone without your phone? That seems
               | dangerous.
               | And why would you start a test if you won't be there to
               | see the results of the test? Seems more sensible to
               | either leave after you've run the test or wait to do so
               | until you get back.
             | gonzo41 wrote:
             | You can trigger events from alarms. And Lambda's only last
             | 15 minutes. So still cheaper than 75K :D.
           | gonzo41 wrote:
           | Just to expand on this. You can have a hard limit. For AWS,
           | create a role/user that's essentially ~root like access. Make
           | a lambda function that's triggered by a billing alert at your
           | threshold to just turn off things from most expensive to
           | least. So turn of the DB servers. So the apps error out and
           | the users go away.
         | bpodgursky wrote:
         | There are some cloud services where it's not quite this simple.
         | S3 -- you can't just delete customer data because they hit a
         | billing limit
         | RDS -- not going to drop databases on the 27th of the month
         | Anything with persistent data is going to have to stay alive
         | and accumulate costs. Admittedly these services aren't where
         | the crazy bills come from, but it does make a simple kill
         | switch a bit more complex.
           | raphaelj wrote:
           | You don't have to immediately delete customer data.
           | Most service that has a limit cap will have a "grace period"
           | of a couple of days during which the service does not work
           | but the data is not deleted. That give your some time to get
           | notified of the issue, and fix the problem/increase the
           | limit.
           | heavyset_go wrote:
           | This is a solved problem for every other service out there.
           | You don't just delete the data, you give the customer a few
           | days, weeks, or a month to pay their bill and if they don't,
           | then you delete their data.
         | ZephyrBlu wrote:
         | Probably because it's not so simple on the backend.
         | I'm guessing there's a good chance a lot of systems are only
         | eventually consistent, which could explain why billing takes a
         | long time to update.
         | Aggregation of service usage for billing could also be an
         | expensive operation, so it's only updated irregularly instead
         | of being near real-time.
         | It would be a great feature, but I can imagine it being very
         | complex. It's also probably cheaper for them to just wave away
         | excess usage like this instead of building out a solution.
           | donmcronald wrote:
           | Azure has it for some plans [1], but not others like pay-as-
           | you-go. It seems arbitrary.
           | 1. https://azure.microsoft.com/en-us/support/legal/offer-
           | detail...
           | a-priori wrote:
           | This is a billing question, not a technical question, and
           | looked at through that lens it's easy to put a hard limit on
           | a monthly bill: just don't ever issue bills greater than that
           | amount.
           | If I say I only want to pay a maximum of $1000 a month, and I
           | hit that limit but it takes a bit for the provider to shut
           | everything down so really $1100 of resources were consumed,
           | then the provider eats the $100 overrun and I get a bill for
           | $1000.
           | With an actual hard limit you create a financial incentive
           | for the provider to minimize this overrun. Yes it might be
           | difficult to fix but I assure you, if hard limits existed,
           | the technical issues would be solved soon enough because now
           | there's a reason to invest in a solution.
             | benlivengood wrote:
             | It's also a mostly solved problem because advertisers have
             | budgets and it's common to implement globally distributed
             | budget servers to avoid showing more ads than the
             | advertiser paid for, despite tens of thousands of
             | individual web servers needing to know which ads in their
             | inventory have budget left.
             | It's a fun exercise similar to global rate-limiting/load-
             | balancing.
               | wikibob wrote:
               | That is fascinating.
               | If you have the time could you (anyone feel free) talk a
               | bit about how you would implement a globally distributed
               | budget?
               | I can imagine a few simple options, but they all seem to
               | have significant shortcomings.
           | kevsim wrote:
           | I think that's not really an issue though is it? If you say
           | "never charge me more than $100" they can a) ensure they
           | never charge you more than $100 and b) work to optimize their
           | own systems so that they cut you off as close to $100 as
           | humanly possible. In the beginning they might eat some costs
           | since it takes them a day to catch it, but they could work
           | over time to bring that down. And it's not like it's costing
           | GCP/AWS/Azure "sticker price" to provide their services.
         | donmcronald wrote:
         | This is my worst nightmare. Lol. I guess now is a great time to
         | give Azure a shoutout for sitting on their hands for 8 years
         | without so much as a response to the community for half a
         | decade [1].
         | At least AWS allows using a prepaid credit card so they'll need
         | to call me if things go haywire. I bet if that $72k charge went
         | through it would have been much harder to get out of. "Sorry,
         | we don't have the money" is a much better negotiating position
         | than "can we please have our money back?"
         | 1. https://feedback.azure.com/forums/170030-signup-and-
         | billing/...
           | dvfjsdhgfv wrote:
           | > "Sorry, we don't have the money" is a much better
           | negotiating position than "can we please have our money
           | back?"
           | I agree but why would you like to be in either position
           | anyway? The so-called cloud services are terribly overpriced
           | when compared to traditional servers.
             | treeman79 wrote:
             | Done correctly they save a lot of IT time.
             | Seem companies hire five 6 figure people to try and cut
             | amazon bill by a couple of grand a month.
             | Never understood spending 50-100k a month to maybe save 5k
               | Retric wrote:
               | It's often a fixed vs ongoing cost question. Spending
               | 200k to save 5k per month breaks even in 3.4 years.
               | However, for growing companies that 5k/month AWS premium
               | can hit 200+k/month very quickly
         | nojito wrote:
         | Price transparency is the antithesis to the "cloud" and it's
         | current financial success.
         | jsiepkes wrote:
         | Not only development but also running in production. You can
         | configure alerts but you can't configure a hard limit. Thats
         | just insane. That makes working with GCP like playing with
         | fire.
           | k__ wrote:
           | What about throttling?
             | pwinnski wrote:
             | aka "Bankrupt me more slowly"
             | Throttling doesn't stop the drain.
             | herendin2 wrote:
             | Nice to have, but people want a throttle that shuts off
             | dead at a certain number of dollars
         | serial_dev wrote:
         | It is baffling why cloud providers don't have that option.
         | I might want to have an app because I don't mind spending 50
         | dollars on my pet project as a hobby, but I don't ever want to
         | spend more than that. Not if I write a wrong query that's
         | suddenly becomes very expensive, not when I got attacked, and
         | not even when I have legit users.
         | By the way, the same goes for some companies, too, just the
         | threshold would be different.
           | brundolf wrote:
           | For hobby projects you probably don't need auto-scaling, and
           | should use a provider that charges a fixed monthly rate.
           | You'll "waste" a little bit of money on unused uptime, but
           | for a hobby project it will be a minuscule amount.
           | greatgib wrote:
           | It's not complicated to add configurable hard limits for
           | these companies but they don't allow it because the current
           | situation is more interesting for them.
           | They want to suck the maximum money from consumers before
           | they realize.
           | For one person that will complain wildly and having to do a
           | gesture, there are hundreds other companies that will not
           | notice or just pay without recourse.
             | onion2k wrote:
             | _They want to suck the maximum money from consumers before
             | they realize._
             | I have very little money so I just don't use their services
             | because a mistake would be disastrous. They might be losing
             | out on me making a unicorn app on their platform. It's
             | unlikely, but while the possibility of catastrophe exists
             | I'll stick to not using them. That extends to not
             | recommending anyone uses them either in case the worst
             | happens.
               | mikestew wrote:
               | _I have very little money..._
               | Then the harsh reality is: companies don't care. Yeah,
               | your app might turn out to be a unicorn, but the
               | overwhelming odds are that it won't. And no one cares
               | that you'll tell your other broke friends to avoid the
               | service.
               | We'd all like to think it to be different, that a company
               | might care about appeasing my broke ass. But as already
               | pointed out, they want the whales. I also wonder, despite
               | the number of years "cloud services" have been around, if
               | companies aren't still trying to figure out a gazillion
               | other things and limiting customer spend might be a bit
               | low on the priority list.
               | thanksforthe42 wrote:
               | Meanwhile this leaves an opportunity for a different
               | company to provide these services.
               | I do my best to avoid FAANG giants who don't think about
               | me.
               | renewiltord wrote:
               | The highly price sensitive customer will force you to
               | compete only on price. That's just forcing yourself into
               | a commodity market. It's bad business. I would never try
               | to cater to that market. Very dangerous. Competition will
               | drive margins down to near zero.
             | Paul-ish wrote:
             | The fact that the dashboards and alerts have a delay sounds
             | like there might be difficult consistency stuff going on.
             | Many nodes need to coordinate their usage and billing. It
             | may be a difficult problem, but solving billing problems
             | might not really motivate anyone at the company. It's not a
             | "cool" problem for engineers and not profitable for
             | product.
               | phkahler wrote:
               | >> The fact that the dashboards and alerts have a delay
               | sounds like there might be difficult consistency stuff
               | going on.
               | I think that's true. It's easier to measure usage and
               | aggregate that data after the fact than to meter it in
               | real time and stop at a limit. Those are very different
               | things. What happens if you hit the cap while running
               | multiple processes spread across a cloud?
               | One improvement might be to throttle things as the cap
               | approaches but that doesnt really change the problem at
               | all. Do that and have provider eat any overages should
               | solve it from the user point of view.
             | ctvo wrote:
             | > They want to suck the maximum money from consumers before
             | they realize.
             | This is a naive understanding of how corporations like
             | Google and Amazon work. Bad will and using gym membership
             | tactics aren't how they scale or make money. Getting you to
             | confidently try things knowing you won't get charged (the
             | reason they have those free tiers) so you'll get your
             | company, your start-up, your next side project on it is
             | much better for business.
             | It's a miss that things like this aren't implemented and
             | widespread, not by design.
             | > It's not complicated to add configurable hard limits for
             | these companies but they don't allow it because the current
             | situation is more interesting for them.
             | I'm not in this space, but from my observations:
             | - Each service has a different billing model and metering
             | model. Most likely this data is held by the service. I'm
             | familiar with AWS so I'll use them as an example. I'd wager
             | only DynamoDB or only Lambda (the service owners) know how
             | much of those services you've consumed
             | - Billing is most likely reconciled asynchronously after
             | collecting all data from all services by an entirely
             | different department with knowledge of payments and
             | accounting
             | - GCP, AWS, Azure launch 50+ services a year
             | - Each large customer most likely has a special rate. I bet
             | Samsung or Snap pay an entirely different set of rates than
             | the normal customer. There are thousands of these
             | exceptions
             | - Cutting your service off when your over the limit is an
             | incredibly complex set of edge conditions. Your long
             | running instance hosting your critical service is shut off
             | because of experimenting on a new ML workflow?
             | Even with only the above I can see the difficulty in
             | globally limiting your spending limit at an accurate level.
             | I know there are features for both AWS and GCP and they
             | try.
             | It's easy to stand on the sidelines and handwave away
             | technical complexity at scale, but I'd encourage you to
             | give all of these providers a more charitable view, at
             | least on this topic.
               | beoberha wrote:
               | I work in this space and you're absolutely correct. Your
               | last paragraph hits the nail on the head for pretty much
               | every complain people have about the public clouds.
               | patrec wrote:
               | Right, so let's say Congress passes a bill that requires
               | cloud providers to enable hard spending limits by start
               | of February 2021, and eat any extra usage costs that
               | exceeded a set limit.
               | What is your educated guess by when this feature would be
               | essentially correctly implemented in AWS and GCP
               | (essentially = negligible costs to the providers due to
               | either false negatives (bills they eat) and false
               | positives (PR fallout, when SomeSite gets shutdown
               | despite not being over limit)?
             | abawany wrote:
             | Every time a GCloud rep would ask us about what we need, we
             | would say: fix the billing interface. As far as I know, it
             | never got fixed. The feelings I would get when looking at
             | cloud billing interfaces can be summed as: obfuscated, like
             | a pawnshop, and caveat emptor. I kind of came to the
             | conclusion that if the cloud giants are not fixing their
             | billing interfaces, then just like Amazon not sending you
             | the details of the items you ordered by email and thus
             | causing you to use the app to help with primenesia, there
             | is a 'business' reason why the billing interfaces are
             | generally incomprehensible.
           | uoaei wrote:
           | > It is baffling why cloud providers don't have that option.
           | ...is it? If a lazy dev leaves their corporate account open
           | and you can bill it for their negligence, protected by the
           | contract you already signed, you earn a lot of money. From a
           | purely business perspective, it is stupid(!) to provide a
           | stopgap for that.
           | Edit: to be clear I am not advocating one way or the other.
           | But it is surprising that people are "baffled" by this
           | obvious profit optimization.
         | trymas wrote:
         | AFAIK digitalocean has notification if you go over user defined
         | limit.
         | cambalache wrote:
         | That should be illegal, but hey, at least they support noble
         | causes, so let them be. It sounds cynic but this is their game.
         | raphaelj wrote:
         | It's ever worse for services like AWS Cloudfront.
         | One of your competitors could just rent a cheap server on OVH
         | with uncapped transfer and incur you $10k in cost in a few
         | hours.
           | cambalache wrote:
           | Maybe that it is your cue to move your server from AWS to
           | OVH*
           | * I dont have any idea about OVH
         | jasonpeacock wrote:
         | It's surprisingly complex to do that. Let's take a simple
         | example and say your cloud account is doing 2 things - compute
         | & storage.
         | Compute is an active resource, when you exceed your budget it
         | can be automatically shutdown.
         | Storage is a passive resource, when you exceed your budget it
         | can be automatically....deleted? That's almost always the wrong
         | action.
         | Providing fine-grained cost limits help some, as passive
         | resources usually don't have massive cost spikes while active
         | resources do, so you can better "protect" your passive
         | resources by setting more aggressive cost limits on the active
         | resources.
         | This quickly gets more complicated. Another example is most
         | monitoring services are a combination of active (actual metric
         | monitoring) and passive (metric history) resources. A cost
         | limit on that monitoring service likely won't provide sub-
         | service granularity, mostly depending on whether the service
         | even has different charges for monitoring vs history.
         | Oh, also, even for a passive resource like storage, you _also_
         | have active resource charges whenever you upload /download your
         | data.
         | Ugh, what a mess. The best thing to do is pay attention to your
         | spending, just like you do with your personal & corporate
         | budget.
           | Closi wrote:
           | It's almost like you could make it configurable so users can
           | choose what happens if they go over, and to what extent.
           | modeless wrote:
           | It's not really that complex. All compute should shut down.
           | All API calls should fail. Storage should be (optionally)
           | preserved at normal cost.
           | Your examples are simple given this framework.
           | Uploading/downloading data to storage is an API call.
           | Monitoring is compute. Metric history storage is storage.
             | jasonpeacock wrote:
             | But storage costs continue to add up even when you're not
             | accessing them - there's a cost to storage _existing_ which
             | continues to accrue with time.
             | When there's no budget left, what do you do with those
             | accruing costs for existing storage?
               | AngusH wrote:
               | If the amount of storage that you can use is limited by
               | quota (say 50GB) the problem becomes relatively easier.
               | You set a quota for 50GB of storage and no more. The
               | server then restricts you by disk quota to that amount of
               | storage.
               | The cost is then calculated as 1.15USD per month.
               | So you don't pay more than 1.15 per month.
               | Compute and transfer (and other things) could be covered
               | by separate similar quotas with a single maximum spend
               | figure at the bottom of the table.
               | modeless wrote:
               | Storage costs are predictable and slow to accumulate.
               | They are rarely the problem people are trying to address
               | when they set a budget. As I said, storage would
               | _optionally_ continue to be charged at the normal rate,
               | the other option being immediate deletion if you really
               | need a super hard budget cap.
               | Once you get the alert that your budget is tripped you
               | can go and see what's in storage via the console and
               | delete it, only paying for a few hours of storage for
               | things you don't want.
               | asciimike wrote:
               | Moreover, once API calls are locked, what next? You can't
               | delete files, and even if you can delete them, you aren't
               | able to retrieve them before deletion... If a platform
               | allows you to do those actions, then it's rife for abuse,
               | and at public cloud scale that ends up being a far, far
               | bigger problem than the occasional blog post that ends up
               | as a refund (because the other blog post is "I got free
               | storage forever with this one weird trick").
               | It's really not a simple problem because the next action
               | depends on the choice the developer wants to make: do
               | they increase the budget or decrease usage, and no cloud
               | provider wants to make this choice because no matter what
               | the choice is it will be viewed as wrong. The best they
               | can do is provide developers the best insight and tooling
               | to make this choice themselves.
               | modeless wrote:
               | Once API calls are locked you can open the console,
               | disable all the things that caused you to hit your
               | budget, and then raise the budget a bit to get access to
               | the storage APIs again and manage your storage. Or, the
               | console's storage browser should let you browse and
               | delete files as well. And again, there should be an
               | option to delete all storage immediately for a hard cap
               | on your budget if you really want that.
               | AngusH wrote:
               | You need separate costed quotas for each type of activity
               | with a combined total at the bottom.
               | You could also have a setting in the admin panel as to
               | what the system should do:
               | [ ] I want to keep going beyond my quotas (but email me)
               | [ ] Please shutdown my site
               | asciimike wrote:
               | If the answer is "you have a dollar limit set of GCS
               | GETs, GCS PUTs, etc." I guess I could see this working,
               | but hot damn that'll be a horrific interface.
               | The other issue is that many large customers pay
               | different prices, so billing and quota aren't really tied
               | to each other, and it wouldn't be easy to reconcile this.
               | As for the button... having been on the product side of
               | building this button, there is no right answer: people
               | will say they never got the email (or it went to the
               | wrong inbox, or their dog ate their phone...) or that
               | they never checked the box to "shut down the site" ("I
               | didn't think it would do X that made my app not work").
               | AngusH wrote:
               | I'd probably want it grouped by category with a drill
               | down interface for the specifics.
               | Probably arranged so you can type in a figure at the
               | bottom for monthly expenditure and it would balance out
               | the requirements based on typical use cases.
               | So enter $50 in the monthly cap figure and it allocates,
               | say, $20 to compute, $20 to transfer operations and API
               | calls, $10 to storage
               | which you could then fiddle with of course.
               | I can't offer much on the second point other than to say
               | that unexpected bills annoy me much more than services
               | that stop working.
               | I've also never worked anywhere with unlimited budgets.
               | (alas)
               | I can see that there are probably cases where uptime is
               | more important so they would be more annoyed the other
               | way around.
           | cesarb wrote:
           | > Storage is a passive resource, when you exceed your budget
           | it can be automatically....deleted? That's almost always the
           | wrong action.
           | A better option would be to automatically reduce the budget
           | by the amount it would cost to keep the storage forever. If
           | doing that would reduce the budget to zero, do not allow
           | increasing the amount of storage. That is: assume the storage
           | will not be deleted, and budget according to that.
             | asciimike wrote:
             | How does this actually work? It clearly can't be forever,
             | since any non-zero dollar amount * infinity months is
             | infinity dollars, which is going to reduce the budget below
             | zero since any non-infinite number minus infinity is less
             | than zero... thus locking it immediately.
             | Even if we say "you get N months of storage before we
             | delete it" and subtract N * current storage cost/month,
             | what happens after you're locked out of all actions because
             | you added an extra GB? Storage APIs cost money to use, so
             | you would get locked out of those too (note that if you're
             | not, people would set arbitrarily low limits and get
             | storage access for free) and couldn't retrieve anything.
             | The only remaining actions are delete (which is free) or
             | raise the quota and do the whole rodeo over again.
             | Abuse is impossible to ignore at public cloud scale, so
             | "free storage forever" (or even, storage at a one time
             | fixed price) as the fallback isn't a viable option.
             | Lastly, from an optics perspective, which blog post would
             | you rather see on the front page of HN: "I did something
             | dumb and spent too much money on Cloud" or "Google is
             | holding our data hostage" (or "Google deleted all my
             | data")?
             | Source: I launched Firebase Storage, which has a GCS bucket
             | that has a hard limit.
               | MichaelBurge wrote:
               | "infinity" is way too conservative: If it costs $1 to
               | maintain, and you can get 10% interest loaning money out
               | at any BDC, then $10 will maintain it forever.
           | AngusH wrote:
           | But we've had disk quotas before that mostly worked?
           | If anything it seems an easier problem than processor time.
           | I recall disk quotas on shared systems at university back in
           | 1998 and I'm sure they existed before that.
           | Two thresholds IIRC, one at which you get a warning, second
           | at which you can't write any further and the disk write
           | operation fails.
           | I don't think they deleted files, it was just you couldn't
           | write more than [quota] bytes to your disk.
           | Is there something particular about cloud based systems that
           | prevent this from working?
           | ie. is this a specific problem with distributed storage?
           | edit:tone
             | robrtsql wrote:
             | S3 costs money to keep your files in, even if you're not
             | touching them, so just preventing further uploads wouldn't
             | do much to prevent your AWS bill from increasing.
       | mlboss wrote:
       | I don't understand why developers use cloud for
       | bootstrapping/side project. Digital Ocean is all you need $5
       | droplet + $15 Postgres or even better $7 dyno on Heroku.
         | WrtCdEvrydy wrote:
         | $5 droplet, 2GB swap (set it internally) and run Caprover.
         | Deploy your Postgres (for DB), minio (for s3 storage) and your
         | webapp from Caprover. Add nodes as you need to scale out.
         | com2kid wrote:
         | I built my startup on a combo of DO and Firebase.
         | If I knew something was going to take more than a couple dozen
         | milliseconds to run, it was built on the DO droplet.
         | Why would I pay by the CPU second for something that is taking
         | a lot of CPU seconds? That billing model doesn't make sense.
         | For my super quick REST endpoints, yeah, all on Firebase, the
         | convenience of writing + deploying makes it an obvious win.
         | (Unless something goes wrong, debugging Firebase functions is
         | not fun...)
       | Alupis wrote:
       | As J. Paul Getty once mused[1]:
       | > If you owe the bank $100 that's your problem. If you owe the
       | bank $100 million, that's the bank's problem.
       | Crappy situation for OP and his startup, but I find the part
       | about reading up on bankruptcy to be a bit premature.
       | Perhaps not the most ethical choice, but what stops OP from just
       | not paying the bill, and finding a different cloud provider?
       | Obviously they'll want to not repeat the "experiment", but
       | seriously... there's no mechanism at Google to stop a new client
       | from running up a near-$100k bill in a single day?
       | That's absurd, and should be a learning lesson for Google more
       | than this startup. Some malicious actor could apparently consume
       | hundreds of thousands of dollars of Google resources and "get
       | away" with it.
       | Wait and see what happens, then deal with it - would be sane
       | advice.
       | [1] https://www.brainyquote.com/quotes/j_paul_getty_129274
         | kabirgoel wrote:
         | One of my favorite quotes of all time. J. Paul Getty was quite
         | the weirdo. His Wikipedia article is worth a look, especially
         | the section on his frugality.
           | xyzzyz wrote:
           | As an interesting coincidence, large part of Google Cloud
           | organization resides in a building that was formerly
           | headquarters of Getty Images, a company founded by Mark
           | Getty, a grandson of J. Paul Getty.
           | pontifier wrote:
           | Lol. I love it. I moved to a state I'd never considered
           | because it had the largest, cheapest building in the US.
           | It's 220,000 square feet, but I've lived in a tent out back
           | for the last 6 months because I can't get an occupancy
           | permit, it's not zoned residential, and I refuse to pay rent
           | on an apartment.
             | sterlind wrote:
             | You live in a 220,000 square foot building?! Is this an
             | abandoned missile silo or something? I want to know more.
             | dumpsterdiver wrote:
             | > It's 220,000 square feet
             | Is it an old airplane hangar?
         | sudcha wrote:
         | OP here.
         | Bankruptcy fear was real at the time. Google has at least a few
         | thousand lawyers on payroll. They probably also have a process
         | of handling delinquencies and sending them notices. A quick
         | look at the lawyer fee to just manage the case, let alone fight
         | it, is enough for bootstrapped company to raise hands.
         | +1 to bad actors possibility. I shared this with Google team,
         | I'm not sure what they have done since.
         | We are out of that situation and I wrote the post so others,
         | relatively new to Cloud don't make same mistakes.
         | Fail fast is a very bad idea with Cloud.
           | Alupis wrote:
           | All true, and good points you raise.
           | However, Google's army of lawyers costs them real money,
           | where your bill is largely made up numbers.
           | Perhaps the true cost is still enough to warrant sic'ing
           | their lawyers on your company.
           | Even in that situation, a wait-and-see approach is still
           | pretty advisable. The worst case scenario was already known
           | to you - bankruptcy.
           | Nothing Google or their lawyers do would change that worst-
           | case outcome, and if Google was aware you literally don't
           | have $72k, and might just declare bankruptcy and walk away,
           | they'll be much more eager to negotiate a more reasonable
           | bill and settle your account. It's exactly as J. Paul Getty
           | said...
           | Very glad it's being worked out and you will not have to go
           | down that path.
