[HN Gopher] Updating a 50 terabyte PostgreSQL database (2018) ___________________________________________________________________ Updating a 50 terabyte PostgreSQL database (2018) Author : Maksadbek Score : 99 points Date : 2022-01-13 17:06 UTC (5 hours ago) (HTM) web link (www.adyen.com) (TXT) w3m dump (www.adyen.com) | jfbaro wrote: | Thanks for sharing this. Really interesting. But a basic | question, why not upgrading to PG 13 instead? I am curious about | the reasons for staying on an older version of PostgreSQL. | jfbaro wrote: | I see it now that it's a 2018 article. Makes sense! | hawk_ wrote: | Interesting. Does PG13 let then do this without downtime? | CubsFan1060 wrote: | Not from 9.4, but now there is a path using logical | replication between versions: | https://www.percona.com/blog/2019/04/04/replication- | between-... | simonw wrote: | I didn't understand how they avoid downtime during the upgrade - | they start with "Stop traffic to the database cluster" and then | perform a bunch of steps that each take 5-10 minutes or longer | before starting traffic again - so do they just accept the | downtime and schedule it in advance, or did I miss a detail? | rattray wrote: | Yeah, they architect their application to accept DB downtime - | but I'm sure their services are still degraded to some degree | or another during this, and they aren't clear how much total DB | downtime they need for this (and how that time scales across | various axes). | | Overall my takeaway is basically "if you want to upgrade a | large Postgres db, you'll need like an hour of planned downtime | and a lot of careful work" which... doesn't make me excited to | upgrade big postgres db's in production. | znpy wrote: | > Overall my takeaway is basically "if you want to upgrade a | large Postgres db, you'll need like an hour of planned | downtime and a lot of careful work" | | HA is one of those things where MySQL wins hands down, sadly. | Sadly in the sense that PostgreSQL HA still looks like a | couple of hacks held together with duct tape, at least when | compared to MySQL solutions. | | The Percona MySQL distribution has multi-master HA (Percona | Xtradb Cluster) which is great for HA, and vanilla MySQL 8.x | has group replication which iirc also is multi-master. | MariaDB instead has multi-master based on Galera Cluster | (although i'm not very knowledgeable about MariaDB). | | In PostgreSQL world there are many HA solutions, none of | which is both multi-master and open source (sadly). | rattray wrote: | I did a little googling, and still find the answers are | "logical replication, which is slow for large DB's" or | "pg_upgrade, which requires nontrivial downtime". | | Gitlab wrote a much more in-depth rundown in 2020 of a | pg_upgrade, which in their case took 2h of downtime and 4h | total time: https://about.gitlab.com/blog/2020/09/11/gitlab- | pg-upgrade/ (submitted to HN here: | https://news.ycombinator.com/item?id=29926316). They included | a video of the whole thing which is pretty incredible. | gbrown_ wrote: | Whilst others have mentioned their application is architected | to cope with this I wanted to echo the sentiment. This post | feels very much like a "draw the rest of the owl" meme. | vonnieda wrote: | > One other detail to note is that we built our software | architecture in such a way that we can stop traffic to our | PostgreSQL databases, queue the transactions, and run a | PostgreSQL update without affecting payments acceptance. | foobiekr wrote: | That's downtime by any reasonable measure depending on the | exact specifics. Is writing to a journal while the DB is down | "downtime"? | | Site wise no but DB-wise yes. | throwaway29303 wrote: | (2018) | wiz21c wrote: | was wondering why they were using old PG's :-) | abagheri43 wrote: | abagheri43 wrote: | LoungeFlyZ wrote: | Irony. I think the HN post took down the Slony website. | mulmen wrote: | http://howfuckedismydatabase.com/postgres/slony.php | leifg wrote: | One of the things I always wonder with giant relational database | is. How much of the "typical relational stuff" are they actually | using? | | Do they have constraints on rows? Are they using views or do they | just denormalize and duplicate? Do they use joins at all? Are | they even doing more than 1 thing in a transaction? | mulmen wrote: | My only experience with databases of that size is for data | analysis so yeah, constraints are relaxed. But even at that | point ideas like normalization are critical to extracting | performance out of large datasets. | | Normalization _is_ a performance optimization. Denormalization | is a development shortcut. Neither is right or wrong but I | would be surprised if a 50TB OLTP database wasn't already | highly normalized. | | If it isn't then my next guess is that it could be made smaller | or more performant if it was. | | We used to be proud of server uptime, back when we gave them | names. Today if you have a server up for 900 days you're going | to be accused of malpractice. | | Similar for data. We used to be proud of being able to keep | "big" data online, but I'm no longer impressed. You're just | spending money. Did you actually solve the business case as | efficiently as possible given available resources? Do you | _need_ 50TB in the first place? | merb wrote: | > Normalization is a performance optimization | | that is a stupid statement. because it's way too generic and | it depends on the use case. read heavy data that needs a lot | of joins are most often denormalized IF updating a lot is not | a problem. sometimes you need to create views that pull in | different stuff with different queries which would make them | not really performant especially not on postgres which is | just super slow when it comes to listing/filtering data. | Kranar wrote: | Normalization is pretty independent of optimization strategy | and generally that's not presented as one of its advantages. | Normalizing data can improve write performance but will make | reads slower, so if you have a write intensive database then | you will see some performance gains. | newlisp wrote: | Normalization improves data integrity. | paozac wrote: | 50TB is not so big these days. I read that in 2008 (!) Yahoo had | a 2+ PB PG database. What is the largest you know of, 14 years | later? | Gigachad wrote: | How are people dealing with databases this large? At work we | have a mysql db with a table that has 130M records in it and a | count(*) on the table takes 100 seconds. Anything but a simple | look up by id is almost unworkable. I assumed this was normal | because its too big. But am I missing something here? Are SQL | databases capable of actually working fast at 50TB? | itsthecourier wrote: | Something1234 wrote: | count(*) is always going to be slow. They don't store the | number of live tuples, just an estimate so it's a full table | scan. The secret is to use indexes to get down to a small bit | that you care about. If you're filtering on 3 columns, the | goal is to get the index to wipe out at least half the | results you don't care about and so on and so forth. | | A 130M record table with no indexes is going to be crazy | slow. Although if all you need are primary key updates, then | that's the way to go. | ok_dad wrote: | Around ~2005 I took a tour of the [a well known government | organization] and they were bragging about several-PB-sized | databases at the time. Interestingly, there was a TON of server | racks there in a bomb-proof building with tons of security, and | they were all IBM servers (a supercomputer maybe?), if I | remember correctly. Also, there was one small server rack that | was painted differently from the rest (it looked like something | made in-house), and we asked what it was, and the tour guide (a | PhD computer scientist) said that technically it doesn't exist | and he couldn't talk about it even though it was super cool. | Now that I know what they were doing around that time (and | probably still today) I am kinda scared at the implications of | that tour guide's statement and what that one tiny rack was | for. I'm glad I never went to work in their organization, since | that tour was meant to recruit some of us a few years down the | road. | ddorian43 wrote: | Was it a single server? | GordonS wrote: | 50TB is big. Bigger is possible I'm sure, but I'd guess | 99.something% of all PG databases are less than 50TB. | | If someone here commented they had a 2PB database, I | _guarantee_ someone else here would be like "pfft, that's not | big"... | aantix wrote: | With 50TB, and if you were doing a full text search, wouldn't | the entirety of the index have to be held in memory? | Groxx wrote: | it's more than big enough to cause big problems / risk days | of downtime to change, yea. 50GB is not big. 50TB is at least | touching big - you can do it on one physical machine if | needed, but it's the sort of scale that benefits from bigger- | system architecture. 50PB would be world-class big, hitting | exciting new problems every time they do something. | mritun wrote: | The OP message could have better said that 50TB databases are | common these days when single metal or 24xl I3en or I4* | instance on AWS can hold 60T raw. | lelandbatey wrote: | This article mentions that a key piece of software they use in | all this is Slony[0], software which allows replicating one | primary/master to many secondaries/slaves, even when those | replicas have different versions. That's pretty cool, but I'd | like to instead draw your attention (dear reader) to the | _phenomenal_ writing happening in Slonys description of itself. | It describes, in plain but technical language what Slony is, then | immediately jumps into addressing "why should you use this?" and | does so very directly. It reiterates the problem it solves, | mentions other potential solutions _by name_ to contextualize the | current solution space, then explains in a nice bulleted list | exactly the situations and niches that Slony is tailored for. | | Oh my heavens, I wish every piece of technology described itself | this way! | | [0] - Slony -- https://www.slony.info/ | xblau wrote: | Archived link, since it appears to be down at the moment: | https://web.archive.org/web/20211031124351/https://www.slony... | [deleted] ___________________________________________________________________ (page generated 2022-01-13 23:00 UTC)