[HN Gopher] Why we moved from AWS RDS to Postgres in Kubernetes ___________________________________________________________________ Why we moved from AWS RDS to Postgres in Kubernetes Author : elitan Score : 101 points Date : 2022-09-26 18:15 UTC (4 hours ago) (HTM) web link (nhost.io) (TXT) w3m dump (nhost.io) | techn00 wrote: | So what solution did you end up using? Crunchy operator? | nesmanrique wrote: | We evaluated several operators but at the end decided it would | be best to deploy our own setup for the postgres workloads | instead using helm. | geggam wrote: | I would love to see the monitoring on this. | | Network IOPs and NAT nastiness or disk IO the bigger issue ? | qeternity wrote: | These threads are always full of people who have always used an | AWS/GCP/Azure service, or have never actually run the service | themselves. | | Running HA Postgres is not easy...but at any sort of scale where | this stuff matters, nothing is easy. It's not as if AWS has 100% | uptime, nor is it super cheap/performant. There are tradeoffs for | everyone's use-case but every thread is full of people at one end | of the cloud / roll-your-own spectrum. | 988747 wrote: | I've been successfully running Postgres in Kubernetes with the | Operator from Crunchy Data. It makes HA setup really easy with | a tool called Patroni, which basically takes care of all the | hard stuff. Running 1 primary and 2 replicas is really no | harder than running single-node Postgres. | api wrote: | I wonder how many people use things like CockroachDB, Yugabyte, | or TiDB? They're at least in theory far easier to run in HA | configurations at the cost of some additional overhead and in | some cases more limited SQL functionality. | | They seem like a huge step up from the arcane "1980s Unix" | nightmare of Postgres clustering but I don't hear about them | that much. Are they not used much or are their users just happy | and quiet? | | (These are all "NewSQL" databases.) | belmont_sup wrote: | New user of cockroach. We'll find out! If this startup ever | makes it to any meaningful user sizd | ftufek wrote: | Honestly, that's what I initially thought trying to run ha | postgres on k8s, but zalando's postgres operator made things so | much easier (maybe even easier than RDS). Very easy to rollout | as many postgres clusters with whatever size you want. We've | been running our production db on it for the last 6 months or | so, no outage yet. Though I guess if you have to have a very | custom setup, it might be more difficult. | qubit23 wrote: | I was hoping to see a bit more of an explanation of how this was | implemented. | elitan wrote: | We need a follow up: *How* we're running thosands of Postgres | databases in Kubernetes. | KaiserPro wrote: | In this instance I can see the point, being able to give raw | access to customer's own psql instance is a good feature. | | but. It sounds bloody expensive to develop and maintain a | reliable psql service on k8s | jmarbach wrote: | $0.50 per extra GB seems high, especially for a storage-intensive | app. Given the cost of cloud Object Storage services it doesn't | seem to make much sense. | | Examples of alternatives for managed Postgres: | | * Supabase is $0.125 per GB | | * DigitalOcean managed Postgres is ~$0.35 per GB | makestuff wrote: | SUpabase runs on AWS so they are either losing a ton of money, | have some amazing deal with AWS, or the $0.50 is inaccurate. | kiwicopple wrote: | (supabase ceo) | | EBS pricing is here: https://aws.amazon.com/ebs/pricing/ | | I'd have to check with the team but I'm 80% sure we're on gp3 | ($0.08/GB-month). | | That said, we have a very generous free tier. With AWS we | have an enterprise plan + savings plan + reserved instances. | Not all of these affect EBS pricing, but we end up paying a | lot less than the average AWS user due to our high-usage. | neilv wrote: | I didn't see "backups" mentioned in that, though I'm sure they | have them. Depending on your needs, it's a big thing to keep in | mind while weighing options. | | For a small startup or operation, a managed service having | credible snapshots, PITR backups, failover, etc. is going to save | a business a lot of ops cost, compared to DIY designing, | implementing, testing, and drilling, to the same level of | credibility. | | One recent early startup, I looked at the amount of work for me | or a contractor/consultant/hire to upgrade our Postgres recovery | capability (including testing and drills) with confidence. I soon | decided to move from self-hosted Postgres to RDS Postgres. | | RDS was a significant chunk of our modest AWS bill (otherwise, | almost entirely plain EC2, S3, and traffic), but easy to justify | to the founders, just by mentioning the costs it saved us for | business existential protection we needed. | nunopato wrote: | Thanks for bringing this up. We do have backups running daily, | and we will have "backups on demand" soon as well. | nunopato wrote: | (Nhost) | | Sorry for not answering everyone individually, but I see some | confusion duo to the lack of context about what we do as a | company. | | First things first, Nhost falls into the category of backend-as- | a-service. We provision and operate infrastructure at scale, and | we also provide and run the necessary services for features such | as user authentication and file storage, for users creating | applications and businesses. A project/backend is comprised of a | Postgres Database and the aforementioned services, none of it is | shared. You get your own GraphQL engine, your own auth service, | etc. We also provide the means to interface with the backend | through our official SDKs. | | Some points I see mentioned below that are worth exploring: | | - One RDS instance per tenant is prohibited from a cost | perspective, obviously. RDS is expensive and we have a very | generous free tier. | | - We run the infrastructure for thousands of projects/backends | which we have absolutely no control over what they are used for. | Users might be building a simple job board, or the next Facebook | (please don't). This means we have no idea what the workloads and | access patterns will look like. | | - RDS is mature and a great product, AWS is a billion dolar | company, etc - that is all true. But is it also true that we do | not control if a user's project is missing an index and the fact | that RDS does not provide any means to limit CPU/memory usage per | database/tenant. | | - We had a couple of discussions with folks at AWS and for the | reasons already mentioned, there was no obvious solution to our | problem. Let me reiterate this, the folks that own the service | didn't have a solution to our problem given our constraints. | | - Yes, this is a DIY scenario, but this is part of our core | business. | | I hope this clarifies some of the doubts. And I expect to have a | more detailed and technical blog post about our experience soon. | | By the way, we are hiring. If you think what we're doing is | interesting and you have experience operating Postgres at scale, | please write me an email at nuno@nhost.io. And don't forget to | star us at https://github.com/nhost/nhost. | cloudbee wrote: | And what are your cost savings from RDS perspective. I'd a | similar problem where we'd to provision like 5 databases for 5 | different teams. RDS is really expensive. And your solution is | open source ? I would like to try. | SOLAR_FIELDS wrote: | RDS and similar managed databases are over half of our total | cloud bill at my place of work. Managed databases in general | are _really expensive_. | nunopato wrote: | I hope to have a more detailed analysis to share when we have | more accurate data. We launched individual instances recently | and although I don't have exact numbers, the price difference | will be significant. Just imagine how much it would cost to | have 1 RDS instance per tenant (we have thousands). | | We haven't open-sourced any of this work yet but we hope to | do it soon. Join us on discord if you want to follow along | (https://nhost.io/discord). | mp3tricord wrote: | In a production data base why are people executing long running | queries on the primary. They should be using a DB replica. | xwowsersx wrote: | I've recently been spending a fair amount of time trying to | improve query performance on RDS. This includes reviewing and | optimizing particularly nasty queries, tuning PG configuration | (min_wal_size, random_page_cost, work_mem, etc). I am using a | db.t3.xlarge with general purpose SSD (gp2) for a web server that | sees moderate writes and a lot of reads. I know there's no real | way to know other than through testing, but I'm not clear on | which instance type best serves our needs -- I think it may very | well be the case that the t3 family isn't fit for our purposes. | I'm also unclear on whether we ought to switch to provisioned | IOPS SSD. Does anyone have any general pointers here? I know the | question is pretty open-ended, but would be great if anyone has | general advice from personal experience? | notac wrote: | I'd recommend hopping off of t3 asap if you're searching for | performance gains - performance can be extremely variable (by | design). M class will even you out. | | General storage IOPS is governed by your provisioned storage | size. You can again get much more consistent performance by | using provisioned IOPS. | | Feel free to email me if you want to chat through things | specific to your env - email is in my about: | xwowsersx wrote: | Thank you so much, will definitely take you up on the offer. | Nextgrid wrote: | It's hard to say without metrics; what does your CPU load look | like? In general, unless your CPU is often maxing out, changing | the CPU is unlikely to help, so you're left with either memory | or IO. | | Unused memory on Linux will be automatically used to cache IO | operations, and you can also tweak PG itself to use more memory | during queries (search for "work_mem", though there are | others). | | If your workload is read-heavy, just giving it more memory so | that the majority of your dataset is always in the kernel IO | cache will give you an immediate performance boost, without | even having to tweak PG's config (though that might help even | further). This won't transfer to writes - those still require | an actual, uncached IO operation to complete (unless you want | to put your data at risk, in which case there are parameters | that can be used to override that). | | For write-heavy workloads, you will need to upgrade IO; there's | no way around the "provisioned IOPS" disks. | xwowsersx wrote: | Thanks very much for the reply. CPU is not often maxing out. | Here's a graph of max CPU utilization from the last week | https://ibb.co/tzw5p3L | Nextgrid wrote: | You've got some spikes that could signify some large or | unoptimized queries, but otherwise yes, the CPU doesn't | look _that_ hot. | | I suggest upgrading to an instance type which gives you | 32GB or more of memory. You'll get a bigger CPU along with | it as well, but don't make the CPU your priority, it's not | your main bottleneck at the moment. | xwowsersx wrote: | Makes sense, thank you. Sounds like M class is the way to | go as other commenter suggested. Also, yes. There are | many awful queries that I'm aware of and working to | correct. | stunt wrote: | What's the benefit of running Postgres in Kubernetes vs VMs (with | replication obviously)? | radimm wrote: | Having recently heard a lot of about PostgreSQL in Kubernetes | (cloudNativePG for example) it always makes me wonder about the | actual load and the complexity of the cluster in the question. | | > This is the reason why we were able to easily cope with 2M+ | requests in less than 24h when Midnight Society launched | | This gives the answer, while it's probably not evenly distributed | gives 23 req/sec (guess peak 60 - 100 might be already stretching | it). Always wonder about use cases around 3 - 5k req/sec as | minimum. | | [edit] PS: not really ditching neither k8s pg nor AWS RDS or | similar solutions. Just being curious. | kccqzy wrote: | > This is the reason why we were able to easily cope with 2M+ | requests in less than 24h | | I thought this was referring to 2M+ requests per second over a | ramp period of 24h, not 2M requests per 24h? | xani_ wrote: | It's essentially just a process running in a cgroup so | performance shouldn't be all that different than bare metal/VM | postgresql. | | Main difference would be storage speed and how it exactly is | attached to a container. | brand wrote: | I've personally deployed O(TBs) and O(10^4 TPS) Postgres | clusters on Kubernetes with a CNPG-style operator based | deployment. There are some subtleties to it but it's not | exceeding complicated, and a good project like CNPG goes a long | way to shaving off those sharp edges. As other commenters have | suggested it's good to really understand Kubernetes if you want | to do it, though. | radimm wrote: | Thanks for the confirmation. As mentioned I'm not saying no | to it. It is really that "really understand" part which holds | me back for now - mainly the observability and dealing with | edge cases in high-throughput environment. | Nextgrid wrote: | > 23 req/sec (guess peak 60 - 100 might be already stretching | it) | | That kind of load is something a decent developer laptop with | an NVME drive can serve, nothing to write home about. | | It is sad that the "cloud" and all these supposedly "modern" | DevOps systems managed to redefine the concept of "performance" | for a large chunk of the industry. | rrampage wrote: | It depends a lot on the backend architecture. Number of DB | requests per web request can also be high due to the | pathological cases in some ORMs which can result in N+1 query | problems or eagerly fetching entire object hierarchies. Such | problems in application code can get brushed under the carpet | due to "magical" autoscaling (be it RDS or K8s). There can | also be fanout to async services/job queues which will in | turn run even more DB queries. | AccountAccount1 wrote: | Hey, this is not a problem for us at Nhost since most of | the interfacing with Postgres is through Hasura (a GraphQL | SQL-to-GraphQL) it solves the n+1 issue by compiling a | performant sql statement from the gql query (it's also | written in haskell, you can read more here | https://hasura.io/blog/architecture-of-a-high-performance- | gr...) | robertlagrant wrote: | I don't think K8s at least will autoscale quickly enough to | mask something like that. | singron wrote: | RDS tops out at about 18000 IOPS since it uses a single ebs | volume. Any decent ssd will do much better. E.g. a 970 evo | will easily do >100K IOPS and can do more like 400K in ideal | conditions. | | You can get that many IOPS with aurora, but the cost is | exorbitant. | mcbain wrote: | I don't think it has been a single EBS volume for a while, | but in any case, 256k is more than 18k. https://docs.aws.am | azon.com/AmazonRDS/latest/UserGuide/CHAP_... | mhuffman wrote: | It does depend on the architecture and framework they are | using imo. I have a single Hetzer machine with spinning plate | HDs that serves between 1-2 million requests per day hitting | DB and ML models and rarely every gets over 1% CPU usage. I | have pressure-tested it to around 3k reqs/sec. On the other | hand I have seen WP and CodeIgniter setups that even with 5 | copies running on the largest AWS instances available, | "optimized" to the hilt, caching everywhere possible, etc. | absolute crumble under the load of 3k req per min. (not sec | ... min). | | Many frameworks that make early development easy fuck you | later during growth with ORM calls, tons of unnecessary text | in the DB, etc. | Nextgrid wrote: | Keep in mind that your Hetzner instance has locally- | attached storage and a real CPU as opposed to networked | storage and a slice of a CPU, so I'm not surprised at all | that this beats an AWS setup even on the more expensive | instances. | | Yes, frameworks can be a problem (although including WP in | the list is an insult to other, _actually decent_ | frameworks), but I would bet good money if they moved their | setup to a Hetzner setup it would still fly. Non-optimal | ORM calls can be optimized manually without necessarily | dropping the framework altogether. | marcosdumay wrote: | Hum... The Hetzner instance is very likely cheaper than | any AWS setup, so while there is a point in that part, | it's not a very relevant one. (And that's exactly the | issue with the "modern DevOps" tooling.) | acdha wrote: | > On the other hand I have seen WP and CodeIgniter setups | that even with 5 copies running on the largest AWS | instances available, "optimized" to the hilt, caching | everywhere possible, etc. absolute crumble under the load | of 3k req per min. (not sec ... min). | | This sounds like some other architectural problems - | running nowhere near the largest instances available that | was single node performance on EC2 in the 2000s. | | There are concerns switching from local to SAN storage, of | course, but that's also shifting the problem if you care | about durability. | derefr wrote: | Depends on the queries. Point queries that take 1ms each? | Sure. Analytical queries that take 1000ms+ each? Not so much. | jerf wrote: | I can't blame it on "cloud", though it's not helping that | there are an awful lot of cloud services that claim to be | "high performance" and are often mediumish at best. But in | general I see a lot of ignorance in the developer community | as to how fast things should be able to run, even in terms of | reading local files and doing local manipulations with no | "cloud" in sight. | | Honestly, if I had to pin it on just one thing, I'd blame | networking everything. Cloud would fit as a subset of that. | Networking slows things down at the best of times, and the | latency distribution can be a nightmare at the worst. Few | developers think about the cost of using the network, and | even fewer can think about it holistically (e.g., to avoid | making 50 network transactions spread throughout the system | when you could do it all in one transaction if you rearranged | things). | geggam wrote: | Are you talking about the cloud host to cloud host | networking or the POD networking inside the single host ? | | The dizzying amount of NAT layers has to be killing | performance. I haven't had the chance to ever sit down and | unravel a system running a good load. The lack of TCP | tuning combined with the required connection tracking is | interesting to think about | kazen44 wrote: | i still dont understand why nearly all CNI's are so hell | bent on implementing a dozen layers of NAT to tunnel | their overlay networks, instead of implementing a proper | control plane to automate it all away between routes. | | Calico seems to be doing it semi-okeish, and even their | the control plane is kind of unfinished? | | The only software based solution which seem to properly | have this figured out is VMware NSX-T. (i am not counting | all the traditional overlay networks in use by ISP's | based on MPLS/BGP). | geggam wrote: | Before you even get to the CNI, I think AWS VM to | internet is at least 3 NAT layers. | | So we have 3 layers from container to pod. The virtual | host kernel is tracking those layers. Once connection to | one container is 3 tracked connections. Then you have | whatever else you put on top to go in and out of the | internet. | | The funny think to me is HaProxy recommended getting rid | of connection tracking for performance while everyone is | doubling down on that alone and calling it performant. | kazen44 wrote: | > Few developers think about the cost of using the network. | | Developers do not seem to realise how slow the network is | compared to everything else. | | Sure, 100gbit network itnerfaces do exist, but most servers | are attached with 10gbit interfaces, and most of the actual | implementations will not actually manage to hit something | like 10gbit/s because of latency and window scaling. | | You cannot escape latency (without inventing another | universe in which physics do not apply). And latency is | detrimental to performance. | | Getting anything across a large enough network under | 1millisecond is hard, and compared to a IOP on a local NVME | disk, it is painfully slow. | whoisthemachine wrote: | > You cannot escape latency (without inventing another | universe in which physics do not apply). And latency is | detrimental to performance. | | This. So few people distinguish between bandwidth and | latency. One can be increased arbitrarily and fairly | easily with new encoding techniques (which generally only | improves edge cases), and the other has a floor that is | hard-coded into our universe. I've gotten into debates | with folks who think a 10GB connection from the EU to | Texas should be as fast as a connection from Texas to the | Midwest, or to speed up the EU-TX connection they just | need to spend more on bandwidth. | briffle wrote: | it seems most of the tools for running postgresql in K8s | seem to just default to creating a new copy of the DB at | the drop of a hat. When your DB is in the multi-TB sizes, | that can come with a noticable cost in network fees, plus | a very long delay, even on modern fast networks. | ayende wrote: | You are off by a couple of orders of magnitude | | I have run 500+ req/sec on a raspberry pi using 4 TB dataset | with 2 GB of RAM, with under 100ms for the 99.99 percentile | | A few hundreds req a second is basically nothing. | c2h5oh wrote: | That kind of a load you can handle on spinning rust without | breaking a sweat. | MBCook wrote: | So they switch from one giant RDS instance with all tenants per | AZ to per-tenant PG in Kubernetes. | | So really we don't know how much RDS was a problem compared to | the the tenant distribution. | | For the purposes of an article like this it would be nice if the | two steps were separate or they had synthetic benchmarks of the | various options. | | But I understand why they just moved forward. They said they | consulted experts, it would also be nice to discuss some of what | they looked or asked about. | 0xbadcafebee wrote: | Ah, the 'ol sunk cost fallacy of infrastructure. We are already | investing in supporting K8s, so let's throw the databases in | there too. Couldn't possibly be that much work. | | Sure, a decade-old dedicated team at a billion-dollar | multinational corporation has honed a solution designed to | support hundreds of thousands of customers with high | availability, and we could pay a little bit extra money to spin | up a new database per tenant that's a little bit less flexible, | ..... or we could reinvent everything they do on our own software | platform and expect the same results. All it'll cost us is extra | expertise, extra staff, extra time, extra money, extra planning, | and extra operations. But surely it will improve our product | dramatically. | gw99 wrote: | I'm not so sure. All you have is another layer of abstraction | between you and the problem that you are facing. And that level | of abstraction may violate your SLAs unless you pitch $15k for | the enterprise support option. And that may not even be | fruitful because it relies on an uncertain network of folk at | the other end who may or may not even be able to interpret | and/or solve your problem. Also you are at the whim of their | changes which may or may not break your shit. | | Source: AWS user on very very large scale stuff for about 10 | years now. It's not magic or perfection. It's just someone | else's pile of problems that are lurking. The only consolation | is they appear to try slightly harder than the datacentres that | we replaced. | xani_ wrote: | [deleted] | KaiserPro wrote: | > I bet you also hate on people making their own espresso | instead of just going to starbucks | | Hobbies are not the same as bottom line business. | | As with everything, managing state at scale is _very_ hard. | Then you have to worry about backing it up. | [deleted] | wbl wrote: | Running a statefull service in K8S is its own ball of wax | foobarian wrote: | Yes, Postgres on K8S... <shudder> | patrec wrote: | It is, but then I never understood why on earth you'd use | k8s if you don't have stateful services. I mean really, | what's the point? | mijamo wrote: | Because it's easy? What alternative would you suggest? | patrec wrote: | The idea that something of the monstrous complexity of | k8s is easy is pretty funny to me. I think if you have | less than than 2 full time experts on k8s at hand, you're | basically nuts if you use it for some non-toy project. In | my experience, you can and will experience interesting | failure scenarios. | | If you don't have state, why not just either use | something serverless/fully-managed (beanstalk, lambda, | cloudflare workers whatever) if you really need to scale | up and down (or have very limited devops/sysadmin | capacity) or deploy like 2 or 3 bare metal machines or | VMs? | | Either sounds like a lot less work to manage and | troubleshoot than some freaking k8s cluster. | janee wrote: | Bare metal I'd think is the first choice for a large | rdbms where you have skilled dedicated personnel that can | manage it. | | If not rather use a specialist service like RDS for | anything with serious uptime/throughput requirements. | | k8s doesn't really make sense to me unless it's for | spinning up lots of instances, like for test or dev envs | or like in the article where they host DBs for people. | deathanatos wrote: | ... I do it, in my day job. It's really not. StatefulSets | are explicitly for this. | | We also have managed databases, too. | | Self-managed stuff means I can, generally, get shit done | with it, when oddball things need doing. Managed stuff is | fine right up until it isn't (i.e., yet another outage with | the status page being green), or until there's a | requirement that the managed system inexplicably can't | handle (despite the requirement being the sort of obvious | thing you would expect of $SYSTEM, but which no PM thought | to ask before purchasing the deal...), and then you're in | support ticket hell. | | (E.g., we found out the hard way that there is not way to | move a managed PG database from one subnet in a network to | another, in Azure! _Even if you 're willing to restore from | a backup._ We had to deal with that ourselves, by taking a | pgdump -- essentially, un-managed-solution the backup. | | ... the whole reason we needed to move the DB to a | different subnet was because of a _different_ flaw, in a | _different_ managed service, and Azure 's answer on _that_ | ticket was "tough luck DB needs to move". Tickets, | spawning tickets. Support tickets for managed services take | up an unholy portion of my time.) | [deleted] | folkhack wrote: | I'd posit that it's not as simple. Maybe if you're just | cranking out your one-off app or something of the sort... | | But getting a good replication setup that's HA, potentially | across multiple regions/zones, all abstracted under K8s - | yea. That's not trivial. And, it can go _very_ wrong. | | > I bet you also hate on people making their own espresso | instead of just going to starbucks | | This is just unnecessary. | sn0wf1re wrote: | >> I bet you also hate on people making their own espresso | instead of just going to starbucks | | >This is just unnecessary. | | I agree the ad hominem is not required, although the | analogy is itself decent. | folkhack wrote: | I mean I can make up ad hominem analogies about this | stuff too - but it practice it makes people feel | attacked/defensive, and rarely ever adds nuance or | context to the conversation. I feel like in this | situation it could have been omitted as-per HN | guidelines: | | > In Comments: | | > Be kind. Don't be snarky. | coenhyde wrote: | You're talking like managing stateful services in an | ephemeral environment is as simple as installing and | configuring Postgres. Postgres is its self is 1% of the | consideration here. | suggala wrote: | AWS RDS is 10x slower than BareMetal MySQL (both reads and | writes). Slowness is mainly due to the reason that Storage is | over network for RDS. | | Not bad to invest some extra time to get better performance. | | You are falling to "Appeal to antiquity" fallacy if you think | something old is better. | 0xbadcafebee wrote: | What you describe is still a fallacy because it's assuming | that just because you _can_ get better performance with | BareMetal, that somehow this is a cheaper or better option. | In fact it will be either more error-prone, or more | expensive, or both, because you are trying to reproduce from | scratch what the whole RDS team has been doing for 10 years. | Nextgrid wrote: | It's unlikely running it on K8S (which is itself going to run | on underpowered VMs with networked storage) is going to help. | | If you're gonna spend effort in running Postgres manually, do | it on bare-metal and at least get some reward out of it | (performance _and_ reduced cost). | derefr wrote: | > It's unlikely running it on K8S (which is itself going to | run on underpowered VMs with networked storage) is going to | help. | | On GCP, at least, you can provision a GKE node-pool where | the nodes have direct-attached NVMe storage; deploy a | privileged container that formats and RAID0s up the drives; | and then make use of the resulting scratch filesystem via | host-mounts. | qeternity wrote: | > It's unlikely running it on K8S (which is itself going to | run on underpowered VMs with networked storage) is going to | help. | | What?? We run replicated Patroni on local NVMEs and it's | incredibly fast. | dijit wrote: | And when it all goes bottoms up it will be much more difficult | to resolve. | baq wrote: | Fortunately Postgres doesn't do that often by itself. It | usually needs some creative developer's assistance. | dijit wrote: | I think you're triggering the worst case a lot more often | when it comes to running Postgres on k8s: the storage can | be removed independently from the workload and the pod can | be evicted much easier than it would be in traditional | database hosting methods. | | No need for developers to do anything strange at all. | throwawaymaths wrote: | Depends. A lot of postgres usage is often "things that might | as well be redis", like session tokens (but the library we | imported came configured for postgres) so if the postgres | goes down, as long as it can be restarted it won't be the end | of the world even if all the data were wiped. | | Probably there is also an 80/20 for most users where it's not | awful if you can restore from a cold storage, say 12h, | backup. | HL33tibCe7 wrote: | Couldn't you just spin up an RDS instance for each project (so, | single-tenant RDS instances) to avoid the noisy neighbour | problem? Or is that too expensive? | elitan wrote: | We could, yes. But way to expensive compared to our current | setup. | | We're offering free projects (Postgres, GraphQL (Hasura), Auth, | Storage, Serverless Functions) so we need to optimize costs | internally. ___________________________________________________________________ (page generated 2022-09-26 23:00 UTC)