[HN Gopher] A PostgreSQL Docker container that automatically upg... ___________________________________________________________________ A PostgreSQL Docker container that automatically upgrades PostgreSQL Author : justinclift Score : 149 points Date : 2023-07-16 12:35 UTC (10 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | mbreese wrote: | This is a very confusing title and project at first glance. It | took me a minute to figure out what you're doing... | | But just to be clear -- the point of this is to migrate an | existing Postgres DB from the original version to a newer | version? This is about the data volume itself, not automatically | upgrading the Postgres binary when a new version is released? | | So, you'd basically pass a flag that says --migrate-db when | starting the container to kick start changing the data on disk? | So when you start a new Postgres 15 container, you could pass it | a volume with Postgres 13 data and it would auto update the on | disk DB data. | justinclift wrote: | It's pretty much answered here: | https://news.ycombinator.com/item?id=36747498 | | The upgrade-of-your-data happens automatically when the | container starts, prior to starting the database server. | mbreese wrote: | Right, but you're not actually automatically migrating a | Postgres 13 _program_ to Postgres 15 in a running container. | That was what I first thought when I read the title. | | You're using the work "upgrade" differently than how I | normally think about it when talking about version changes. | You're talking about data-on-disk, not just the program | version. | | (I realize that when talking about Postgres, the two are | linked, but that's not the case for most programs) | justinclift wrote: | Is there better wording that's concise? | | I'm just going with what I thought of, but am happy to | adjust things as makes sense. :) | rcfox wrote: | "A PostgreSQL Docker container that automatically | upgrades your database" | | Still uses "upgrade", but highlights the fact that you're | changing the artifact and not just the software. | [deleted] | Izkata wrote: | Postgres docs do use the word upgrade, for example: | | > pg_upgrade (formerly called pg_migrator) allows data stored | in PostgreSQL data files to be upgraded to a later PostgreSQL | major version without the data dump/restore typically required | for major version upgrades | | https://www.postgresql.org/docs/current/pgupgrade.html | VWWHFSfQ wrote: | Just to be clear, this isn't for actually running postgres to | serve your database, right? This is just more or less a tool to | upgrade your data files to a new postgres version? | justinclift wrote: | It does both. As part of the startup process it first checks if | the data files are for an old version, and upgrades them first | (if needed). | | Then it starts PostgreSQL 15.3 as per normal. | | It's intended as being a drop-in replacement for existing | (alpine based) PostgreSQL containers. | | That being said, it's still pretty new so don't use it on | production data until you've tested it first (etc). | | -- | | Btw, it uses the "--link" option when it runs pg_upgrade so | should be reasonably suitable even for databases of a fairly | large size. That option means it processes the database files | "in place" to avoid needing to make a 2nd copy. | | At the moment it only process a single data directory, so if | you have multiple then it's not yet suitable. | minus10eng wrote: | [dead] | WirelessGigabit wrote: | I pin Postgres to vMajor. | | But I was bitten in the rear by Wallabag the other day. Didn't | work anymore. Ended up having to do a doctrine upgrade. Then it | all worked. Same concept I guess. No automatic upgrade does allow | you to pause and evaluate what you're doing, and take backups. | | Now I say it allows you. Whether you (or I) do so... after all, | that is what automatic backups are for... | | Now when did I last validate those? | user6723 wrote: | Mutable containers, what could go wrong? | ndsipa_pomu wrote: | This looks useful. | | I've had issues in the past with PostgreSQL as a backend for | NextCloud (all running in docker) and blindly pulling the latest | PostgreSQL and then wondering why it didn't work when the major | version jumped. (It's easy enough to fix once you figure out why | it's not running - just do a manual export from the previous | version and import the data into the newer version). | | However, does this container automatically backup the data before | upgrading in case you discover that the newer version isn't | compatible with whatever is using it? | justinclift wrote: | > However, does this container automatically backup the data | before upgrading ... | | Nope. This container is the official Docker Postgres | 15.3-alpine3.18 image + the older versions of PostgreSQL | compiled into it and some pg_upgrade scripting added to the | docker entrypoint script to run the upgrade before starting | PostgreSQL. | | It goes out of it's way to use the "--link" option when running | pg_upgrade, to upgrade in-place and therefore avoid making an | additional copy of the data. | | That being said, this is a pretty new project (about a week old | on GitHub), and having some support for making an (optional) | backup isn't a bad idea. | | I'll have to think on a good way to make that work. Probably | needs to check for some environment variable as a toggle or | something, for the people who want it... (unsure yet). | tough wrote: | Seems def interesting to handle backup, wonder if could just | be something simple enough that lets you downgrade if upgrade | fails | justinclift wrote: | Yeah. Probably going to need to add a bunch more error | catching and handling-of-specific-situations. Edge cases | being a thing and all. :) | westurner wrote: | Chapter 26. Backup and Restore: | https://www.postgresql.org/docs/current/backup.html | | Chapter 26. Backup and Restore > 26.3. Continuous | Archiving and Point-in-Time Recovery (PITR) > 26.3.4. | Recovering Using a Continuous Archive Backup: | https://www.postgresql.org/docs/current/continuous- | archiving... | | IIRC there are fancier ways than pg_dump to do Postgres | backups that aren't postgres native PITR? | | gh topic postgresql-backup: | https://github.com/topics/postgresql-backup | | - pgsql-backup.sh: https://github.com/fukawi2/pgsql- | backup/blob/develop/src/pgs... | | - https://github.com/SadeghHayeri/pgkit#backup https://gi | thub.com/SadeghHayeri/pgkit/blob/main/pgkit/cli/co... : | $ sudo pgkit pitr backup <name> <delay> | | > Recover: _This command is used to recover a delayed | replica to a specified point in time between now and the | database 's delay amount. The time can be given in the | YYYY-mm-ddTHH:MM format. The latest keyword can also be | used to recover the database up to the latest transaction | available._: $ sudo pgkit pitr recover | <name> <time> $ sudo pgkit pitr recover <name> | latest | | > _The database will then start replaying the WAL files. | It 's progress can be tracked through the log files at | /var/log/postgresql/._ | | - "PostgreSQL-Disaster-Recovery-With-Barman" | https://github.com/softwarebrahma/PostgreSQL-Disaster- | Recove... : | | > _The solution architecture chosen here is a | 'Traditional backup with WAL streaming' architecture | implementation (Backup via rsync/SSH + WAL streaming). | This is chosen as it provides incremental backup/restore | & a bunch of other features._ | | Glossary of backup terms: | https://en.wikipedia.org/wiki/Glossary_of_backup_terms | | Continuous Data Protection > Continuous vs near | continuous: https://en.wikipedia.org/wiki/Continuous_Data | _Protection#Con... | justinclift wrote: | Thanks. I hadn't come across pgkit before. :) | ndsipa_pomu wrote: | A default option is to warn about the possibility and then | just leave the admins to do their own backups (which they | should be doing anyway). | justinclift wrote: | What do you reckon the right place(s) to warn people would | be? | ndsipa_pomu wrote: | Probably in the project readme. You could put a warning | in the logs as well (e.g. on container startup), but that | wouldn't be as visible. | justinclift wrote: | Thanks. I'll make that update in the morning too. :) | westurner wrote: | As an interactive prompt before mutating the data, with a | `-y` to bypass the interactive check (and in the docs for | `-y`) | justinclift wrote: | Nah. I can't see anything that needs interaction as being | workable, as this is supposed to be an "automatic | upgrade" thing. | jacooper wrote: | I think following MariaDBs lead is a good idea, thru have an | environment variable to enable automatic upgrades, and it | creates a backup before every upgrade. Works like a charm. | justinclift wrote: | Any idea how well it works with multi-TB databases? | | Copying TBs of data around seems like it would delay the | start of the database by a lot. | develatio wrote: | Related: https://github.com/tianon/docker-postgres-upgrade | justinclift wrote: | This one seems like a good proof of concept, but appears to | need running as a separate task for upgrading rather than being | an "automatic upgrade before starting the database server" | thing? | mkarrmann wrote: | Neat :) | | I personally feel like upgrading a database should be an explicit | admin process, and isn't something I want my db container | entrypoint automagically handling. | rc_mob wrote: | yes. i do have a few developer environments this would be great | for though. | justinclift wrote: | Yeah, a lot of the time I'd agree with you. :) | | This container came about for the Redash project | (https://github.com/getredash/redash), which had been stuck on | PostgreSQL 9.5 (!) for years. | | Moving to a newer PostgreSQL version is easy enough for new | installations, but deploying that kind of change to an existing | userbase isn't so pretty. | | For people familiar with the command line, PostgreSQL, and | Docker then its no big deal. | | But a _large_ number of Redash deployments seem to have been | done by people not skilled in those things. "We deployed it | from the Digital Ocean droplet / AWS image / (etc)". | | For those situations, something that takes care of the database | upgrade process automatically is the better approach. :) | [deleted] | cosmotic wrote: | I disagree. It is (or was, last time I tried a year or two | ago) hard to upgrade postgres because you need both the old | and new binary. Package manager tries its best to allow just | one copy. You end up copying from one image to another, then | run into tiny breaking differences. | | I agree it should be explicitly invoked and not automated, | for something almost everyone needs to do, it sure is a hard | task. | geenat wrote: | For the uninitiated: Fairly certain this can only handle minor | version upgrades... Which is generally as easy as moving your | data folder beside the new binary and starting postgres.. (minor | version upgrades are already trivial..) | | Major versions still require a pg_dump (or a scary migration | using postgres' anemic logical replication) unless some | advancement has happened on the postgres side I'm unaware of. | justinclift wrote: | > Beware: Fairly certain this can only handle minor version | upgrades ... | | Nope. It's entirely for upgrading between major PostgreSQL | versions. :) | | It uses the PostgreSQL "pg_upgrade" utility to do the data | upgrade behind the scenes: | | https://www.postgresql.org/docs/current/pgupgrade.html | Alifatisk wrote: | I have always wondered how a database and its persistence is | handled in a containerized environment. | | This covers one of the issues. | johannes1234321 wrote: | The way you handle persistence is by using storage volumes | mounted form the outside. Don't put the data in the container, | only the software, which then can be replaced. Then managing is | similar to other environments. For updating you don't replace | rpm/deb packages and restart, but replace the container using | the same volume, which should trigger the dbms-specific update | routine. Similarly you do backups similar to outside container | world (depending on DBMS and your exact choice, from filesystem | snapshot to logical backup (dump) or something more or less | smart in between) | justinclift wrote: | With PostgreSQL containers used in docker-compose, the common | approach is to use a bind mount so the database files are | persisted on the host. | | I've not done stuff with kubernetes yet though, so I have no | idea how it's done there. | bheadmaster wrote: | Both Docker and Kubernetes can use volumes to provide | persistant storage to containers/pods without bind mount. | justinclift wrote: | Cool, thanks all. :) | johannes1234321 wrote: | > I've not done stuff with kubernetes yet though, so I have | no idea how it's done there. | | Essentially the same, except that K8s gives you a wide | variety of storage backend integrations (Storage classes + | storage providers) which can attach "anything" (local volumes | on the node, NFS, NAS, Cloud Volumes, ...) depending on your | local environment and needs. | tempest_ wrote: | Are people putting their databases in K8s now? I thought | the old rule of thumb was don't do that but perhaps it has | changed. | | Feels like the database would take a massive performance | hit using network backed storage unless the software is | aware of that fact. | mschuster91 wrote: | If you're running in the cloud, say on AWS EKS, it makes | sense to use in-cluster databases for development | environments, and only use RDS DB's for | production/integration to save on hosting costs. | williamdclt wrote: | You take the risk of not catching RDS (or network) quirks | in staging, but that's a trade-off | johannes1234321 wrote: | There is a huge push for doing that. Whether it is | abstract the right thing can be questioned, but many IT | departments decided to standardize around Kubernetes for | all datacenter management and push that way and in some | environments (5G networking) it's part of the specified | stack. | hotpotamus wrote: | Saving it on one host's local filesystem doesn't feel | particularly production-ready. There is a distributed store | system for Kubernetes called "Longhorn" that I've heard good | things about, but I haven't really looked into it much | myself. I just run a pair of VMs with a manual | primary/replica setup and have never needed to fail over to | the replica yet, but I can imagine some sort of fully | orchestrated container solution in the future. | justinclift wrote: | Heh Heh Heh | | I'm just pointing out how it's commonly done. Of course | people add things like replication, distributed | filesystems, (etc) to the mix to suit their needs. :) | hotpotamus wrote: | Yep, it seems like the most common answer is "pay | exorbitant prices to your cloud provider for a managed | SQL database", but we've managed to save a chunk of money | running it on our own. I've always said that between | three engineers(me being one of them), we can form one | competent DBA, but our needs are also pretty modest. | Kab1r wrote: | I've had bad luck with longhorn, but I have heard good | things about using Rook with Ceph for PVCs | wokkel wrote: | So we come full circle it seems. Apt update/upgrade already does | this and has been doing this for a few years now. So if i drop | docker and just run a container with Debian I get this with a few | years of testing behind it. | neilv wrote: | > _It 's whole purpose in life is to automatically detect the | version of PostgreSQL used in the existing PostgreSQL data | directory, and automatically upgrade it (if needed) to the latest | version of PostgreSQL._ | | In a small startup... | | * If the data is mission-critical and constantly changing, | PostgreSQL is a rare infra thing for which I'd use a managed | service like AWS RDS, rather just Debian Stable EC2 or my own | containers. The first time I used RDS, the company couldn't | afford to lose an hour of data (could destroy confidence in | enterprise customer's pilot project), and without RDS, I didn't | have time/resources to be nine-9s confident that we could do PITR | if we ever needed to. | | * If the data is less-critical or permits easy sufficient | backups, and I don't mind a 0-2 year-old stable version of PG, | I'd probably just use whatever PG version Debian Stable has | locked in. And just hold my breath during Debian security updates | to that version. | | (Although I think I saw AWS has a cheaper entry level of pricing | for RDS now, which I'll have to look into next time I have a | concrete need. AWS pricing varies from no-brainers to lunacy, | depending on specifics, and specifics can be tweaked with costs | in mind.) | [deleted] | justsomehnguy wrote: | mkdir /var/lib/postgresql/data/old mv -v | /var/lib/postgresql/data/* /var/lib/postgresql/data/old/ | mkdir /var/lib/postgresql/data/new | | You should create the both dirs first, then _check if they do | really exist_ and only then move the files. Shit happens and it | 's better be safe than sorry. | | Also I would replace ../old and ../new with $OLD and $NEW to have | a bit less clutter in the next block with the explicit upgrade | calls, ie: /usr/local/bin/pg_upgrade --link -d | $OLD -D $NEW -b /usr/local-pg9.5/bin -B /usr/local/bin | | Overall it's nice idea, just need some more safety checking | before starting the process. | justinclift wrote: | Cool, good thinking, thanks heaps. I'll implement that in the | morning. :) | lmeyerov wrote: | That's great. | | We have been maintaining a manual migration script for our docker | users for the DB part, while the app part does migrations | automatically already (Django), so making the db part more built- | in makes a lot of sense. | | Our case doesn't really need live/fully auto, just an easy mode | for admins during upgrade cycles, and our scripts were pretty | generic, so a project like this makes a lot of sense. There are a | few modes we support - local / same-server, cross-node, etc - so | am curious. ___________________________________________________________________ (page generated 2023-07-16 23:00 UTC)