[HN Gopher] Why we moved from AWS RDS to Postgres in Kubernetes
       ___________________________________________________________________
        
       Why we moved from AWS RDS to Postgres in Kubernetes
        
       Author : elitan
       Score  : 101 points
       Date   : 2022-09-26 18:15 UTC (4 hours ago)
        
 (HTM) web link (nhost.io)
 (TXT) w3m dump (nhost.io)
        
       | techn00 wrote:
       | So what solution did you end up using? Crunchy operator?
        
         | nesmanrique wrote:
         | We evaluated several operators but at the end decided it would
         | be best to deploy our own setup for the postgres workloads
         | instead using helm.
        
       | geggam wrote:
       | I would love to see the monitoring on this.
       | 
       | Network IOPs and NAT nastiness or disk IO the bigger issue ?
        
       | qeternity wrote:
       | These threads are always full of people who have always used an
       | AWS/GCP/Azure service, or have never actually run the service
       | themselves.
       | 
       | Running HA Postgres is not easy...but at any sort of scale where
       | this stuff matters, nothing is easy. It's not as if AWS has 100%
       | uptime, nor is it super cheap/performant. There are tradeoffs for
       | everyone's use-case but every thread is full of people at one end
       | of the cloud / roll-your-own spectrum.
        
         | 988747 wrote:
         | I've been successfully running Postgres in Kubernetes with the
         | Operator from Crunchy Data. It makes HA setup really easy with
         | a tool called Patroni, which basically takes care of all the
         | hard stuff. Running 1 primary and 2 replicas is really no
         | harder than running single-node Postgres.
        
         | api wrote:
         | I wonder how many people use things like CockroachDB, Yugabyte,
         | or TiDB? They're at least in theory far easier to run in HA
         | configurations at the cost of some additional overhead and in
         | some cases more limited SQL functionality.
         | 
         | They seem like a huge step up from the arcane "1980s Unix"
         | nightmare of Postgres clustering but I don't hear about them
         | that much. Are they not used much or are their users just happy
         | and quiet?
         | 
         | (These are all "NewSQL" databases.)
        
           | belmont_sup wrote:
           | New user of cockroach. We'll find out! If this startup ever
           | makes it to any meaningful user sizd
        
         | ftufek wrote:
         | Honestly, that's what I initially thought trying to run ha
         | postgres on k8s, but zalando's postgres operator made things so
         | much easier (maybe even easier than RDS). Very easy to rollout
         | as many postgres clusters with whatever size you want. We've
         | been running our production db on it for the last 6 months or
         | so, no outage yet. Though I guess if you have to have a very
         | custom setup, it might be more difficult.
        
       | qubit23 wrote:
       | I was hoping to see a bit more of an explanation of how this was
       | implemented.
        
         | elitan wrote:
         | We need a follow up: *How* we're running thosands of Postgres
         | databases in Kubernetes.
        
       | KaiserPro wrote:
       | In this instance I can see the point, being able to give raw
       | access to customer's own psql instance is a good feature.
       | 
       | but. It sounds bloody expensive to develop and maintain a
       | reliable psql service on k8s
        
       | jmarbach wrote:
       | $0.50 per extra GB seems high, especially for a storage-intensive
       | app. Given the cost of cloud Object Storage services it doesn't
       | seem to make much sense.
       | 
       | Examples of alternatives for managed Postgres:
       | 
       | * Supabase is $0.125 per GB
       | 
       | * DigitalOcean managed Postgres is ~$0.35 per GB
        
         | makestuff wrote:
         | SUpabase runs on AWS so they are either losing a ton of money,
         | have some amazing deal with AWS, or the $0.50 is inaccurate.
        
           | kiwicopple wrote:
           | (supabase ceo)
           | 
           | EBS pricing is here: https://aws.amazon.com/ebs/pricing/
           | 
           | I'd have to check with the team but I'm 80% sure we're on gp3
           | ($0.08/GB-month).
           | 
           | That said, we have a very generous free tier. With AWS we
           | have an enterprise plan + savings plan + reserved instances.
           | Not all of these affect EBS pricing, but we end up paying a
           | lot less than the average AWS user due to our high-usage.
        
       | neilv wrote:
       | I didn't see "backups" mentioned in that, though I'm sure they
       | have them. Depending on your needs, it's a big thing to keep in
       | mind while weighing options.
       | 
       | For a small startup or operation, a managed service having
       | credible snapshots, PITR backups, failover, etc. is going to save
       | a business a lot of ops cost, compared to DIY designing,
       | implementing, testing, and drilling, to the same level of
       | credibility.
       | 
       | One recent early startup, I looked at the amount of work for me
       | or a contractor/consultant/hire to upgrade our Postgres recovery
       | capability (including testing and drills) with confidence. I soon
       | decided to move from self-hosted Postgres to RDS Postgres.
       | 
       | RDS was a significant chunk of our modest AWS bill (otherwise,
       | almost entirely plain EC2, S3, and traffic), but easy to justify
       | to the founders, just by mentioning the costs it saved us for
       | business existential protection we needed.
        
         | nunopato wrote:
         | Thanks for bringing this up. We do have backups running daily,
         | and we will have "backups on demand" soon as well.
        
       | nunopato wrote:
       | (Nhost)
       | 
       | Sorry for not answering everyone individually, but I see some
       | confusion duo to the lack of context about what we do as a
       | company.
       | 
       | First things first, Nhost falls into the category of backend-as-
       | a-service. We provision and operate infrastructure at scale, and
       | we also provide and run the necessary services for features such
       | as user authentication and file storage, for users creating
       | applications and businesses. A project/backend is comprised of a
       | Postgres Database and the aforementioned services, none of it is
       | shared. You get your own GraphQL engine, your own auth service,
       | etc. We also provide the means to interface with the backend
       | through our official SDKs.
       | 
       | Some points I see mentioned below that are worth exploring:
       | 
       | - One RDS instance per tenant is prohibited from a cost
       | perspective, obviously. RDS is expensive and we have a very
       | generous free tier.
       | 
       | - We run the infrastructure for thousands of projects/backends
       | which we have absolutely no control over what they are used for.
       | Users might be building a simple job board, or the next Facebook
       | (please don't). This means we have no idea what the workloads and
       | access patterns will look like.
       | 
       | - RDS is mature and a great product, AWS is a billion dolar
       | company, etc - that is all true. But is it also true that we do
       | not control if a user's project is missing an index and the fact
       | that RDS does not provide any means to limit CPU/memory usage per
       | database/tenant.
       | 
       | - We had a couple of discussions with folks at AWS and for the
       | reasons already mentioned, there was no obvious solution to our
       | problem. Let me reiterate this, the folks that own the service
       | didn't have a solution to our problem given our constraints.
       | 
       | - Yes, this is a DIY scenario, but this is part of our core
       | business.
       | 
       | I hope this clarifies some of the doubts. And I expect to have a
       | more detailed and technical blog post about our experience soon.
       | 
       | By the way, we are hiring. If you think what we're doing is
       | interesting and you have experience operating Postgres at scale,
       | please write me an email at nuno@nhost.io. And don't forget to
       | star us at https://github.com/nhost/nhost.
        
         | cloudbee wrote:
         | And what are your cost savings from RDS perspective. I'd a
         | similar problem where we'd to provision like 5 databases for 5
         | different teams. RDS is really expensive. And your solution is
         | open source ? I would like to try.
        
           | SOLAR_FIELDS wrote:
           | RDS and similar managed databases are over half of our total
           | cloud bill at my place of work. Managed databases in general
           | are _really expensive_.
        
           | nunopato wrote:
           | I hope to have a more detailed analysis to share when we have
           | more accurate data. We launched individual instances recently
           | and although I don't have exact numbers, the price difference
           | will be significant. Just imagine how much it would cost to
           | have 1 RDS instance per tenant (we have thousands).
           | 
           | We haven't open-sourced any of this work yet but we hope to
           | do it soon. Join us on discord if you want to follow along
           | (https://nhost.io/discord).
        
       | mp3tricord wrote:
       | In a production data base why are people executing long running
       | queries on the primary. They should be using a DB replica.
        
       | xwowsersx wrote:
       | I've recently been spending a fair amount of time trying to
       | improve query performance on RDS. This includes reviewing and
       | optimizing particularly nasty queries, tuning PG configuration
       | (min_wal_size, random_page_cost, work_mem, etc). I am using a
       | db.t3.xlarge with general purpose SSD (gp2) for a web server that
       | sees moderate writes and a lot of reads. I know there's no real
       | way to know other than through testing, but I'm not clear on
       | which instance type best serves our needs -- I think it may very
       | well be the case that the t3 family isn't fit for our purposes.
       | I'm also unclear on whether we ought to switch to provisioned
       | IOPS SSD. Does anyone have any general pointers here? I know the
       | question is pretty open-ended, but would be great if anyone has
       | general advice from personal experience?
        
         | notac wrote:
         | I'd recommend hopping off of t3 asap if you're searching for
         | performance gains - performance can be extremely variable (by
         | design). M class will even you out.
         | 
         | General storage IOPS is governed by your provisioned storage
         | size. You can again get much more consistent performance by
         | using provisioned IOPS.
         | 
         | Feel free to email me if you want to chat through things
         | specific to your env - email is in my about:
        
           | xwowsersx wrote:
           | Thank you so much, will definitely take you up on the offer.
        
         | Nextgrid wrote:
         | It's hard to say without metrics; what does your CPU load look
         | like? In general, unless your CPU is often maxing out, changing
         | the CPU is unlikely to help, so you're left with either memory
         | or IO.
         | 
         | Unused memory on Linux will be automatically used to cache IO
         | operations, and you can also tweak PG itself to use more memory
         | during queries (search for "work_mem", though there are
         | others).
         | 
         | If your workload is read-heavy, just giving it more memory so
         | that the majority of your dataset is always in the kernel IO
         | cache will give you an immediate performance boost, without
         | even having to tweak PG's config (though that might help even
         | further). This won't transfer to writes - those still require
         | an actual, uncached IO operation to complete (unless you want
         | to put your data at risk, in which case there are parameters
         | that can be used to override that).
         | 
         | For write-heavy workloads, you will need to upgrade IO; there's
         | no way around the "provisioned IOPS" disks.
        
           | xwowsersx wrote:
           | Thanks very much for the reply. CPU is not often maxing out.
           | Here's a graph of max CPU utilization from the last week
           | https://ibb.co/tzw5p3L
        
             | Nextgrid wrote:
             | You've got some spikes that could signify some large or
             | unoptimized queries, but otherwise yes, the CPU doesn't
             | look _that_ hot.
             | 
             | I suggest upgrading to an instance type which gives you
             | 32GB or more of memory. You'll get a bigger CPU along with
             | it as well, but don't make the CPU your priority, it's not
             | your main bottleneck at the moment.
        
               | xwowsersx wrote:
               | Makes sense, thank you. Sounds like M class is the way to
               | go as other commenter suggested. Also, yes. There are
               | many awful queries that I'm aware of and working to
               | correct.
        
       | stunt wrote:
       | What's the benefit of running Postgres in Kubernetes vs VMs (with
       | replication obviously)?
        
       | radimm wrote:
       | Having recently heard a lot of about PostgreSQL in Kubernetes
       | (cloudNativePG for example) it always makes me wonder about the
       | actual load and the complexity of the cluster in the question.
       | 
       | > This is the reason why we were able to easily cope with 2M+
       | requests in less than 24h when Midnight Society launched
       | 
       | This gives the answer, while it's probably not evenly distributed
       | gives 23 req/sec (guess peak 60 - 100 might be already stretching
       | it). Always wonder about use cases around 3 - 5k req/sec as
       | minimum.
       | 
       | [edit] PS: not really ditching neither k8s pg nor AWS RDS or
       | similar solutions. Just being curious.
        
         | kccqzy wrote:
         | > This is the reason why we were able to easily cope with 2M+
         | requests in less than 24h
         | 
         | I thought this was referring to 2M+ requests per second over a
         | ramp period of 24h, not 2M requests per 24h?
        
         | xani_ wrote:
         | It's essentially just a process running in a cgroup so
         | performance shouldn't be all that different than bare metal/VM
         | postgresql.
         | 
         | Main difference would be storage speed and how it exactly is
         | attached to a container.
        
         | brand wrote:
         | I've personally deployed O(TBs) and O(10^4 TPS) Postgres
         | clusters on Kubernetes with a CNPG-style operator based
         | deployment. There are some subtleties to it but it's not
         | exceeding complicated, and a good project like CNPG goes a long
         | way to shaving off those sharp edges. As other commenters have
         | suggested it's good to really understand Kubernetes if you want
         | to do it, though.
        
           | radimm wrote:
           | Thanks for the confirmation. As mentioned I'm not saying no
           | to it. It is really that "really understand" part which holds
           | me back for now - mainly the observability and dealing with
           | edge cases in high-throughput environment.
        
         | Nextgrid wrote:
         | > 23 req/sec (guess peak 60 - 100 might be already stretching
         | it)
         | 
         | That kind of load is something a decent developer laptop with
         | an NVME drive can serve, nothing to write home about.
         | 
         | It is sad that the "cloud" and all these supposedly "modern"
         | DevOps systems managed to redefine the concept of "performance"
         | for a large chunk of the industry.
        
           | rrampage wrote:
           | It depends a lot on the backend architecture. Number of DB
           | requests per web request can also be high due to the
           | pathological cases in some ORMs which can result in N+1 query
           | problems or eagerly fetching entire object hierarchies. Such
           | problems in application code can get brushed under the carpet
           | due to "magical" autoscaling (be it RDS or K8s). There can
           | also be fanout to async services/job queues which will in
           | turn run even more DB queries.
        
             | AccountAccount1 wrote:
             | Hey, this is not a problem for us at Nhost since most of
             | the interfacing with Postgres is through Hasura (a GraphQL
             | SQL-to-GraphQL) it solves the n+1 issue by compiling a
             | performant sql statement from the gql query (it's also
             | written in haskell, you can read more here
             | https://hasura.io/blog/architecture-of-a-high-performance-
             | gr...)
        
             | robertlagrant wrote:
             | I don't think K8s at least will autoscale quickly enough to
             | mask something like that.
        
           | singron wrote:
           | RDS tops out at about 18000 IOPS since it uses a single ebs
           | volume. Any decent ssd will do much better. E.g. a 970 evo
           | will easily do >100K IOPS and can do more like 400K in ideal
           | conditions.
           | 
           | You can get that many IOPS with aurora, but the cost is
           | exorbitant.
        
             | mcbain wrote:
             | I don't think it has been a single EBS volume for a while,
             | but in any case, 256k is more than 18k. https://docs.aws.am
             | azon.com/AmazonRDS/latest/UserGuide/CHAP_...
        
           | mhuffman wrote:
           | It does depend on the architecture and framework they are
           | using imo. I have a single Hetzer machine with spinning plate
           | HDs that serves between 1-2 million requests per day hitting
           | DB and ML models and rarely every gets over 1% CPU usage. I
           | have pressure-tested it to around 3k reqs/sec. On the other
           | hand I have seen WP and CodeIgniter setups that even with 5
           | copies running on the largest AWS instances available,
           | "optimized" to the hilt, caching everywhere possible, etc.
           | absolute crumble under the load of 3k req per min. (not sec
           | ... min).
           | 
           | Many frameworks that make early development easy fuck you
           | later during growth with ORM calls, tons of unnecessary text
           | in the DB, etc.
        
             | Nextgrid wrote:
             | Keep in mind that your Hetzner instance has locally-
             | attached storage and a real CPU as opposed to networked
             | storage and a slice of a CPU, so I'm not surprised at all
             | that this beats an AWS setup even on the more expensive
             | instances.
             | 
             | Yes, frameworks can be a problem (although including WP in
             | the list is an insult to other, _actually decent_
             | frameworks), but I would bet good money if they moved their
             | setup to a Hetzner setup it would still fly. Non-optimal
             | ORM calls can be optimized manually without necessarily
             | dropping the framework altogether.
        
               | marcosdumay wrote:
               | Hum... The Hetzner instance is very likely cheaper than
               | any AWS setup, so while there is a point in that part,
               | it's not a very relevant one. (And that's exactly the
               | issue with the "modern DevOps" tooling.)
        
             | acdha wrote:
             | > On the other hand I have seen WP and CodeIgniter setups
             | that even with 5 copies running on the largest AWS
             | instances available, "optimized" to the hilt, caching
             | everywhere possible, etc. absolute crumble under the load
             | of 3k req per min. (not sec ... min).
             | 
             | This sounds like some other architectural problems -
             | running nowhere near the largest instances available that
             | was single node performance on EC2 in the 2000s.
             | 
             | There are concerns switching from local to SAN storage, of
             | course, but that's also shifting the problem if you care
             | about durability.
        
           | derefr wrote:
           | Depends on the queries. Point queries that take 1ms each?
           | Sure. Analytical queries that take 1000ms+ each? Not so much.
        
           | jerf wrote:
           | I can't blame it on "cloud", though it's not helping that
           | there are an awful lot of cloud services that claim to be
           | "high performance" and are often mediumish at best. But in
           | general I see a lot of ignorance in the developer community
           | as to how fast things should be able to run, even in terms of
           | reading local files and doing local manipulations with no
           | "cloud" in sight.
           | 
           | Honestly, if I had to pin it on just one thing, I'd blame
           | networking everything. Cloud would fit as a subset of that.
           | Networking slows things down at the best of times, and the
           | latency distribution can be a nightmare at the worst. Few
           | developers think about the cost of using the network, and
           | even fewer can think about it holistically (e.g., to avoid
           | making 50 network transactions spread throughout the system
           | when you could do it all in one transaction if you rearranged
           | things).
        
             | geggam wrote:
             | Are you talking about the cloud host to cloud host
             | networking or the POD networking inside the single host ?
             | 
             | The dizzying amount of NAT layers has to be killing
             | performance. I haven't had the chance to ever sit down and
             | unravel a system running a good load. The lack of TCP
             | tuning combined with the required connection tracking is
             | interesting to think about
        
               | kazen44 wrote:
               | i still dont understand why nearly all CNI's are so hell
               | bent on implementing a dozen layers of NAT to tunnel
               | their overlay networks, instead of implementing a proper
               | control plane to automate it all away between routes.
               | 
               | Calico seems to be doing it semi-okeish, and even their
               | the control plane is kind of unfinished?
               | 
               | The only software based solution which seem to properly
               | have this figured out is VMware NSX-T. (i am not counting
               | all the traditional overlay networks in use by ISP's
               | based on MPLS/BGP).
        
               | geggam wrote:
               | Before you even get to the CNI, I think AWS VM to
               | internet is at least 3 NAT layers.
               | 
               | So we have 3 layers from container to pod. The virtual
               | host kernel is tracking those layers. Once connection to
               | one container is 3 tracked connections. Then you have
               | whatever else you put on top to go in and out of the
               | internet.
               | 
               | The funny think to me is HaProxy recommended getting rid
               | of connection tracking for performance while everyone is
               | doubling down on that alone and calling it performant.
        
             | kazen44 wrote:
             | > Few developers think about the cost of using the network.
             | 
             | Developers do not seem to realise how slow the network is
             | compared to everything else.
             | 
             | Sure, 100gbit network itnerfaces do exist, but most servers
             | are attached with 10gbit interfaces, and most of the actual
             | implementations will not actually manage to hit something
             | like 10gbit/s because of latency and window scaling.
             | 
             | You cannot escape latency (without inventing another
             | universe in which physics do not apply). And latency is
             | detrimental to performance.
             | 
             | Getting anything across a large enough network under
             | 1millisecond is hard, and compared to a IOP on a local NVME
             | disk, it is painfully slow.
        
               | whoisthemachine wrote:
               | > You cannot escape latency (without inventing another
               | universe in which physics do not apply). And latency is
               | detrimental to performance.
               | 
               | This. So few people distinguish between bandwidth and
               | latency. One can be increased arbitrarily and fairly
               | easily with new encoding techniques (which generally only
               | improves edge cases), and the other has a floor that is
               | hard-coded into our universe. I've gotten into debates
               | with folks who think a 10GB connection from the EU to
               | Texas should be as fast as a connection from Texas to the
               | Midwest, or to speed up the EU-TX connection they just
               | need to spend more on bandwidth.
        
               | briffle wrote:
               | it seems most of the tools for running postgresql in K8s
               | seem to just default to creating a new copy of the DB at
               | the drop of a hat. When your DB is in the multi-TB sizes,
               | that can come with a noticable cost in network fees, plus
               | a very long delay, even on modern fast networks.
        
           | ayende wrote:
           | You are off by a couple of orders of magnitude
           | 
           | I have run 500+ req/sec on a raspberry pi using 4 TB dataset
           | with 2 GB of RAM, with under 100ms for the 99.99 percentile
           | 
           | A few hundreds req a second is basically nothing.
        
           | c2h5oh wrote:
           | That kind of a load you can handle on spinning rust without
           | breaking a sweat.
        
       | MBCook wrote:
       | So they switch from one giant RDS instance with all tenants per
       | AZ to per-tenant PG in Kubernetes.
       | 
       | So really we don't know how much RDS was a problem compared to
       | the the tenant distribution.
       | 
       | For the purposes of an article like this it would be nice if the
       | two steps were separate or they had synthetic benchmarks of the
       | various options.
       | 
       | But I understand why they just moved forward. They said they
       | consulted experts, it would also be nice to discuss some of what
       | they looked or asked about.
        
       | 0xbadcafebee wrote:
       | Ah, the 'ol sunk cost fallacy of infrastructure. We are already
       | investing in supporting K8s, so let's throw the databases in
       | there too. Couldn't possibly be that much work.
       | 
       | Sure, a decade-old dedicated team at a billion-dollar
       | multinational corporation has honed a solution designed to
       | support hundreds of thousands of customers with high
       | availability, and we could pay a little bit extra money to spin
       | up a new database per tenant that's a little bit less flexible,
       | ..... or we could reinvent everything they do on our own software
       | platform and expect the same results. All it'll cost us is extra
       | expertise, extra staff, extra time, extra money, extra planning,
       | and extra operations. But surely it will improve our product
       | dramatically.
        
         | gw99 wrote:
         | I'm not so sure. All you have is another layer of abstraction
         | between you and the problem that you are facing. And that level
         | of abstraction may violate your SLAs unless you pitch $15k for
         | the enterprise support option. And that may not even be
         | fruitful because it relies on an uncertain network of folk at
         | the other end who may or may not even be able to interpret
         | and/or solve your problem. Also you are at the whim of their
         | changes which may or may not break your shit.
         | 
         | Source: AWS user on very very large scale stuff for about 10
         | years now. It's not magic or perfection. It's just someone
         | else's pile of problems that are lurking. The only consolation
         | is they appear to try slightly harder than the datacentres that
         | we replaced.
        
         | xani_ wrote:
        
           | [deleted]
        
           | KaiserPro wrote:
           | > I bet you also hate on people making their own espresso
           | instead of just going to starbucks
           | 
           | Hobbies are not the same as bottom line business.
           | 
           | As with everything, managing state at scale is _very_ hard.
           | Then you have to worry about backing it up.
        
           | [deleted]
        
           | wbl wrote:
           | Running a statefull service in K8S is its own ball of wax
        
             | foobarian wrote:
             | Yes, Postgres on K8S... <shudder>
        
             | patrec wrote:
             | It is, but then I never understood why on earth you'd use
             | k8s if you don't have stateful services. I mean really,
             | what's the point?
        
               | mijamo wrote:
               | Because it's easy? What alternative would you suggest?
        
               | patrec wrote:
               | The idea that something of the monstrous complexity of
               | k8s is easy is pretty funny to me. I think if you have
               | less than than 2 full time experts on k8s at hand, you're
               | basically nuts if you use it for some non-toy project. In
               | my experience, you can and will experience interesting
               | failure scenarios.
               | 
               | If you don't have state, why not just either use
               | something serverless/fully-managed (beanstalk, lambda,
               | cloudflare workers whatever) if you really need to scale
               | up and down (or have very limited devops/sysadmin
               | capacity) or deploy like 2 or 3 bare metal machines or
               | VMs?
               | 
               | Either sounds like a lot less work to manage and
               | troubleshoot than some freaking k8s cluster.
        
               | janee wrote:
               | Bare metal I'd think is the first choice for a large
               | rdbms where you have skilled dedicated personnel that can
               | manage it.
               | 
               | If not rather use a specialist service like RDS for
               | anything with serious uptime/throughput requirements.
               | 
               | k8s doesn't really make sense to me unless it's for
               | spinning up lots of instances, like for test or dev envs
               | or like in the article where they host DBs for people.
        
             | deathanatos wrote:
             | ... I do it, in my day job. It's really not. StatefulSets
             | are explicitly for this.
             | 
             | We also have managed databases, too.
             | 
             | Self-managed stuff means I can, generally, get shit done
             | with it, when oddball things need doing. Managed stuff is
             | fine right up until it isn't (i.e., yet another outage with
             | the status page being green), or until there's a
             | requirement that the managed system inexplicably can't
             | handle (despite the requirement being the sort of obvious
             | thing you would expect of $SYSTEM, but which no PM thought
             | to ask before purchasing the deal...), and then you're in
             | support ticket hell.
             | 
             | (E.g., we found out the hard way that there is not way to
             | move a managed PG database from one subnet in a network to
             | another, in Azure! _Even if you 're willing to restore from
             | a backup._ We had to deal with that ourselves, by taking a
             | pgdump -- essentially, un-managed-solution the backup.
             | 
             | ... the whole reason we needed to move the DB to a
             | different subnet was because of a _different_ flaw, in a
             | _different_ managed service, and Azure 's answer on _that_
             | ticket was  "tough luck DB needs to move". Tickets,
             | spawning tickets. Support tickets for managed services take
             | up an unholy portion of my time.)
        
           | [deleted]
        
           | folkhack wrote:
           | I'd posit that it's not as simple. Maybe if you're just
           | cranking out your one-off app or something of the sort...
           | 
           | But getting a good replication setup that's HA, potentially
           | across multiple regions/zones, all abstracted under K8s -
           | yea. That's not trivial. And, it can go _very_ wrong.
           | 
           | > I bet you also hate on people making their own espresso
           | instead of just going to starbucks
           | 
           | This is just unnecessary.
        
             | sn0wf1re wrote:
             | >> I bet you also hate on people making their own espresso
             | instead of just going to starbucks
             | 
             | >This is just unnecessary.
             | 
             | I agree the ad hominem is not required, although the
             | analogy is itself decent.
        
               | folkhack wrote:
               | I mean I can make up ad hominem analogies about this
               | stuff too - but it practice it makes people feel
               | attacked/defensive, and rarely ever adds nuance or
               | context to the conversation. I feel like in this
               | situation it could have been omitted as-per HN
               | guidelines:
               | 
               | > In Comments:
               | 
               | > Be kind. Don't be snarky.
        
           | coenhyde wrote:
           | You're talking like managing stateful services in an
           | ephemeral environment is as simple as installing and
           | configuring Postgres. Postgres is its self is 1% of the
           | consideration here.
        
         | suggala wrote:
         | AWS RDS is 10x slower than BareMetal MySQL (both reads and
         | writes). Slowness is mainly due to the reason that Storage is
         | over network for RDS.
         | 
         | Not bad to invest some extra time to get better performance.
         | 
         | You are falling to "Appeal to antiquity" fallacy if you think
         | something old is better.
        
           | 0xbadcafebee wrote:
           | What you describe is still a fallacy because it's assuming
           | that just because you _can_ get better performance with
           | BareMetal, that somehow this is a cheaper or better option.
           | In fact it will be either more error-prone, or more
           | expensive, or both, because you are trying to reproduce from
           | scratch what the whole RDS team has been doing for 10 years.
        
           | Nextgrid wrote:
           | It's unlikely running it on K8S (which is itself going to run
           | on underpowered VMs with networked storage) is going to help.
           | 
           | If you're gonna spend effort in running Postgres manually, do
           | it on bare-metal and at least get some reward out of it
           | (performance _and_ reduced cost).
        
             | derefr wrote:
             | > It's unlikely running it on K8S (which is itself going to
             | run on underpowered VMs with networked storage) is going to
             | help.
             | 
             | On GCP, at least, you can provision a GKE node-pool where
             | the nodes have direct-attached NVMe storage; deploy a
             | privileged container that formats and RAID0s up the drives;
             | and then make use of the resulting scratch filesystem via
             | host-mounts.
        
             | qeternity wrote:
             | > It's unlikely running it on K8S (which is itself going to
             | run on underpowered VMs with networked storage) is going to
             | help.
             | 
             | What?? We run replicated Patroni on local NVMEs and it's
             | incredibly fast.
        
         | dijit wrote:
         | And when it all goes bottoms up it will be much more difficult
         | to resolve.
        
           | baq wrote:
           | Fortunately Postgres doesn't do that often by itself. It
           | usually needs some creative developer's assistance.
        
             | dijit wrote:
             | I think you're triggering the worst case a lot more often
             | when it comes to running Postgres on k8s: the storage can
             | be removed independently from the workload and the pod can
             | be evicted much easier than it would be in traditional
             | database hosting methods.
             | 
             | No need for developers to do anything strange at all.
        
           | throwawaymaths wrote:
           | Depends. A lot of postgres usage is often "things that might
           | as well be redis", like session tokens (but the library we
           | imported came configured for postgres) so if the postgres
           | goes down, as long as it can be restarted it won't be the end
           | of the world even if all the data were wiped.
           | 
           | Probably there is also an 80/20 for most users where it's not
           | awful if you can restore from a cold storage, say 12h,
           | backup.
        
       | HL33tibCe7 wrote:
       | Couldn't you just spin up an RDS instance for each project (so,
       | single-tenant RDS instances) to avoid the noisy neighbour
       | problem? Or is that too expensive?
        
         | elitan wrote:
         | We could, yes. But way to expensive compared to our current
         | setup.
         | 
         | We're offering free projects (Postgres, GraphQL (Hasura), Auth,
         | Storage, Serverless Functions) so we need to optimize costs
         | internally.
        
       ___________________________________________________________________
       (page generated 2022-09-26 23:00 UTC)