[HN Gopher] A PostgreSQL Docker container that automatically upg...
       ___________________________________________________________________
        
       A PostgreSQL Docker container that automatically upgrades
       PostgreSQL
        
       Author : justinclift
       Score  : 149 points
       Date   : 2023-07-16 12:35 UTC (10 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | mbreese wrote:
       | This is a very confusing title and project at first glance. It
       | took me a minute to figure out what you're doing...
       | 
       | But just to be clear -- the point of this is to migrate an
       | existing Postgres DB from the original version to a newer
       | version? This is about the data volume itself, not automatically
       | upgrading the Postgres binary when a new version is released?
       | 
       | So, you'd basically pass a flag that says --migrate-db when
       | starting the container to kick start changing the data on disk?
       | So when you start a new Postgres 15 container, you could pass it
       | a volume with Postgres 13 data and it would auto update the on
       | disk DB data.
        
         | justinclift wrote:
         | It's pretty much answered here:
         | https://news.ycombinator.com/item?id=36747498
         | 
         | The upgrade-of-your-data happens automatically when the
         | container starts, prior to starting the database server.
        
           | mbreese wrote:
           | Right, but you're not actually automatically migrating a
           | Postgres 13 _program_ to Postgres 15 in a running container.
           | That was what I first thought when I read the title.
           | 
           | You're using the work "upgrade" differently than how I
           | normally think about it when talking about version changes.
           | You're talking about data-on-disk, not just the program
           | version.
           | 
           | (I realize that when talking about Postgres, the two are
           | linked, but that's not the case for most programs)
        
             | justinclift wrote:
             | Is there better wording that's concise?
             | 
             | I'm just going with what I thought of, but am happy to
             | adjust things as makes sense. :)
        
               | rcfox wrote:
               | "A PostgreSQL Docker container that automatically
               | upgrades your database"
               | 
               | Still uses "upgrade", but highlights the fact that you're
               | changing the artifact and not just the software.
        
               | [deleted]
        
         | Izkata wrote:
         | Postgres docs do use the word upgrade, for example:
         | 
         | > pg_upgrade (formerly called pg_migrator) allows data stored
         | in PostgreSQL data files to be upgraded to a later PostgreSQL
         | major version without the data dump/restore typically required
         | for major version upgrades
         | 
         | https://www.postgresql.org/docs/current/pgupgrade.html
        
       | VWWHFSfQ wrote:
       | Just to be clear, this isn't for actually running postgres to
       | serve your database, right? This is just more or less a tool to
       | upgrade your data files to a new postgres version?
        
         | justinclift wrote:
         | It does both. As part of the startup process it first checks if
         | the data files are for an old version, and upgrades them first
         | (if needed).
         | 
         | Then it starts PostgreSQL 15.3 as per normal.
         | 
         | It's intended as being a drop-in replacement for existing
         | (alpine based) PostgreSQL containers.
         | 
         | That being said, it's still pretty new so don't use it on
         | production data until you've tested it first (etc).
         | 
         | --
         | 
         | Btw, it uses the "--link" option when it runs pg_upgrade so
         | should be reasonably suitable even for databases of a fairly
         | large size. That option means it processes the database files
         | "in place" to avoid needing to make a 2nd copy.
         | 
         | At the moment it only process a single data directory, so if
         | you have multiple then it's not yet suitable.
        
       | minus10eng wrote:
       | [dead]
        
       | WirelessGigabit wrote:
       | I pin Postgres to vMajor.
       | 
       | But I was bitten in the rear by Wallabag the other day. Didn't
       | work anymore. Ended up having to do a doctrine upgrade. Then it
       | all worked. Same concept I guess. No automatic upgrade does allow
       | you to pause and evaluate what you're doing, and take backups.
       | 
       | Now I say it allows you. Whether you (or I) do so... after all,
       | that is what automatic backups are for...
       | 
       | Now when did I last validate those?
        
       | user6723 wrote:
       | Mutable containers, what could go wrong?
        
       | ndsipa_pomu wrote:
       | This looks useful.
       | 
       | I've had issues in the past with PostgreSQL as a backend for
       | NextCloud (all running in docker) and blindly pulling the latest
       | PostgreSQL and then wondering why it didn't work when the major
       | version jumped. (It's easy enough to fix once you figure out why
       | it's not running - just do a manual export from the previous
       | version and import the data into the newer version).
       | 
       | However, does this container automatically backup the data before
       | upgrading in case you discover that the newer version isn't
       | compatible with whatever is using it?
        
         | justinclift wrote:
         | > However, does this container automatically backup the data
         | before upgrading ...
         | 
         | Nope. This container is the official Docker Postgres
         | 15.3-alpine3.18 image + the older versions of PostgreSQL
         | compiled into it and some pg_upgrade scripting added to the
         | docker entrypoint script to run the upgrade before starting
         | PostgreSQL.
         | 
         | It goes out of it's way to use the "--link" option when running
         | pg_upgrade, to upgrade in-place and therefore avoid making an
         | additional copy of the data.
         | 
         | That being said, this is a pretty new project (about a week old
         | on GitHub), and having some support for making an (optional)
         | backup isn't a bad idea.
         | 
         | I'll have to think on a good way to make that work. Probably
         | needs to check for some environment variable as a toggle or
         | something, for the people who want it... (unsure yet).
        
           | tough wrote:
           | Seems def interesting to handle backup, wonder if could just
           | be something simple enough that lets you downgrade if upgrade
           | fails
        
             | justinclift wrote:
             | Yeah. Probably going to need to add a bunch more error
             | catching and handling-of-specific-situations. Edge cases
             | being a thing and all. :)
        
               | westurner wrote:
               | Chapter 26. Backup and Restore:
               | https://www.postgresql.org/docs/current/backup.html
               | 
               | Chapter 26. Backup and Restore > 26.3. Continuous
               | Archiving and Point-in-Time Recovery (PITR) > 26.3.4.
               | Recovering Using a Continuous Archive Backup:
               | https://www.postgresql.org/docs/current/continuous-
               | archiving...
               | 
               | IIRC there are fancier ways than pg_dump to do Postgres
               | backups that aren't postgres native PITR?
               | 
               | gh topic postgresql-backup:
               | https://github.com/topics/postgresql-backup
               | 
               | - pgsql-backup.sh: https://github.com/fukawi2/pgsql-
               | backup/blob/develop/src/pgs...
               | 
               | - https://github.com/SadeghHayeri/pgkit#backup https://gi
               | thub.com/SadeghHayeri/pgkit/blob/main/pgkit/cli/co... :
               | $ sudo pgkit pitr backup <name> <delay>
               | 
               | > Recover: _This command is used to recover a delayed
               | replica to a specified point in time between now and the
               | database 's delay amount. The time can be given in the
               | YYYY-mm-ddTHH:MM format. The latest keyword can also be
               | used to recover the database up to the latest transaction
               | available._:                 $ sudo pgkit pitr recover
               | <name> <time>       $ sudo pgkit pitr recover <name>
               | latest
               | 
               | > _The database will then start replaying the WAL files.
               | It 's progress can be tracked through the log files at
               | /var/log/postgresql/._
               | 
               | - "PostgreSQL-Disaster-Recovery-With-Barman"
               | https://github.com/softwarebrahma/PostgreSQL-Disaster-
               | Recove... :
               | 
               | > _The solution architecture chosen here is a
               | 'Traditional backup with WAL streaming' architecture
               | implementation (Backup via rsync/SSH + WAL streaming).
               | This is chosen as it provides incremental backup/restore
               | & a bunch of other features._
               | 
               | Glossary of backup terms:
               | https://en.wikipedia.org/wiki/Glossary_of_backup_terms
               | 
               | Continuous Data Protection > Continuous vs near
               | continuous: https://en.wikipedia.org/wiki/Continuous_Data
               | _Protection#Con...
        
               | justinclift wrote:
               | Thanks. I hadn't come across pgkit before. :)
        
           | ndsipa_pomu wrote:
           | A default option is to warn about the possibility and then
           | just leave the admins to do their own backups (which they
           | should be doing anyway).
        
             | justinclift wrote:
             | What do you reckon the right place(s) to warn people would
             | be?
        
               | ndsipa_pomu wrote:
               | Probably in the project readme. You could put a warning
               | in the logs as well (e.g. on container startup), but that
               | wouldn't be as visible.
        
               | justinclift wrote:
               | Thanks. I'll make that update in the morning too. :)
        
               | westurner wrote:
               | As an interactive prompt before mutating the data, with a
               | `-y` to bypass the interactive check (and in the docs for
               | `-y`)
        
               | justinclift wrote:
               | Nah. I can't see anything that needs interaction as being
               | workable, as this is supposed to be an "automatic
               | upgrade" thing.
        
           | jacooper wrote:
           | I think following MariaDBs lead is a good idea, thru have an
           | environment variable to enable automatic upgrades, and it
           | creates a backup before every upgrade. Works like a charm.
        
             | justinclift wrote:
             | Any idea how well it works with multi-TB databases?
             | 
             | Copying TBs of data around seems like it would delay the
             | start of the database by a lot.
        
       | develatio wrote:
       | Related: https://github.com/tianon/docker-postgres-upgrade
        
         | justinclift wrote:
         | This one seems like a good proof of concept, but appears to
         | need running as a separate task for upgrading rather than being
         | an "automatic upgrade before starting the database server"
         | thing?
        
       | mkarrmann wrote:
       | Neat :)
       | 
       | I personally feel like upgrading a database should be an explicit
       | admin process, and isn't something I want my db container
       | entrypoint automagically handling.
        
         | rc_mob wrote:
         | yes. i do have a few developer environments this would be great
         | for though.
        
         | justinclift wrote:
         | Yeah, a lot of the time I'd agree with you. :)
         | 
         | This container came about for the Redash project
         | (https://github.com/getredash/redash), which had been stuck on
         | PostgreSQL 9.5 (!) for years.
         | 
         | Moving to a newer PostgreSQL version is easy enough for new
         | installations, but deploying that kind of change to an existing
         | userbase isn't so pretty.
         | 
         | For people familiar with the command line, PostgreSQL, and
         | Docker then its no big deal.
         | 
         | But a _large_ number of Redash deployments seem to have been
         | done by people not skilled in those things.  "We deployed it
         | from the Digital Ocean droplet / AWS image / (etc)".
         | 
         | For those situations, something that takes care of the database
         | upgrade process automatically is the better approach. :)
        
           | [deleted]
        
           | cosmotic wrote:
           | I disagree. It is (or was, last time I tried a year or two
           | ago) hard to upgrade postgres because you need both the old
           | and new binary. Package manager tries its best to allow just
           | one copy. You end up copying from one image to another, then
           | run into tiny breaking differences.
           | 
           | I agree it should be explicitly invoked and not automated,
           | for something almost everyone needs to do, it sure is a hard
           | task.
        
       | geenat wrote:
       | For the uninitiated: Fairly certain this can only handle minor
       | version upgrades... Which is generally as easy as moving your
       | data folder beside the new binary and starting postgres.. (minor
       | version upgrades are already trivial..)
       | 
       | Major versions still require a pg_dump (or a scary migration
       | using postgres' anemic logical replication) unless some
       | advancement has happened on the postgres side I'm unaware of.
        
         | justinclift wrote:
         | > Beware: Fairly certain this can only handle minor version
         | upgrades ...
         | 
         | Nope. It's entirely for upgrading between major PostgreSQL
         | versions. :)
         | 
         | It uses the PostgreSQL "pg_upgrade" utility to do the data
         | upgrade behind the scenes:
         | 
         | https://www.postgresql.org/docs/current/pgupgrade.html
        
       | Alifatisk wrote:
       | I have always wondered how a database and its persistence is
       | handled in a containerized environment.
       | 
       | This covers one of the issues.
        
         | johannes1234321 wrote:
         | The way you handle persistence is by using storage volumes
         | mounted form the outside. Don't put the data in the container,
         | only the software, which then can be replaced. Then managing is
         | similar to other environments. For updating you don't replace
         | rpm/deb packages and restart, but replace the container using
         | the same volume, which should trigger the dbms-specific update
         | routine. Similarly you do backups similar to outside container
         | world (depending on DBMS and your exact choice, from filesystem
         | snapshot to logical backup (dump) or something more or less
         | smart in between)
        
         | justinclift wrote:
         | With PostgreSQL containers used in docker-compose, the common
         | approach is to use a bind mount so the database files are
         | persisted on the host.
         | 
         | I've not done stuff with kubernetes yet though, so I have no
         | idea how it's done there.
        
           | bheadmaster wrote:
           | Both Docker and Kubernetes can use volumes to provide
           | persistant storage to containers/pods without bind mount.
        
             | justinclift wrote:
             | Cool, thanks all. :)
        
           | johannes1234321 wrote:
           | > I've not done stuff with kubernetes yet though, so I have
           | no idea how it's done there.
           | 
           | Essentially the same, except that K8s gives you a wide
           | variety of storage backend integrations (Storage classes +
           | storage providers) which can attach "anything" (local volumes
           | on the node, NFS, NAS, Cloud Volumes, ...) depending on your
           | local environment and needs.
        
             | tempest_ wrote:
             | Are people putting their databases in K8s now? I thought
             | the old rule of thumb was don't do that but perhaps it has
             | changed.
             | 
             | Feels like the database would take a massive performance
             | hit using network backed storage unless the software is
             | aware of that fact.
        
               | mschuster91 wrote:
               | If you're running in the cloud, say on AWS EKS, it makes
               | sense to use in-cluster databases for development
               | environments, and only use RDS DB's for
               | production/integration to save on hosting costs.
        
               | williamdclt wrote:
               | You take the risk of not catching RDS (or network) quirks
               | in staging, but that's a trade-off
        
               | johannes1234321 wrote:
               | There is a huge push for doing that. Whether it is
               | abstract the right thing can be questioned, but many IT
               | departments decided to standardize around Kubernetes for
               | all datacenter management and push that way and in some
               | environments (5G networking) it's part of the specified
               | stack.
        
           | hotpotamus wrote:
           | Saving it on one host's local filesystem doesn't feel
           | particularly production-ready. There is a distributed store
           | system for Kubernetes called "Longhorn" that I've heard good
           | things about, but I haven't really looked into it much
           | myself. I just run a pair of VMs with a manual
           | primary/replica setup and have never needed to fail over to
           | the replica yet, but I can imagine some sort of fully
           | orchestrated container solution in the future.
        
             | justinclift wrote:
             | Heh Heh Heh
             | 
             | I'm just pointing out how it's commonly done. Of course
             | people add things like replication, distributed
             | filesystems, (etc) to the mix to suit their needs. :)
        
               | hotpotamus wrote:
               | Yep, it seems like the most common answer is "pay
               | exorbitant prices to your cloud provider for a managed
               | SQL database", but we've managed to save a chunk of money
               | running it on our own. I've always said that between
               | three engineers(me being one of them), we can form one
               | competent DBA, but our needs are also pretty modest.
        
             | Kab1r wrote:
             | I've had bad luck with longhorn, but I have heard good
             | things about using Rook with Ceph for PVCs
        
       | wokkel wrote:
       | So we come full circle it seems. Apt update/upgrade already does
       | this and has been doing this for a few years now. So if i drop
       | docker and just run a container with Debian I get this with a few
       | years of testing behind it.
        
       | neilv wrote:
       | > _It 's whole purpose in life is to automatically detect the
       | version of PostgreSQL used in the existing PostgreSQL data
       | directory, and automatically upgrade it (if needed) to the latest
       | version of PostgreSQL._
       | 
       | In a small startup...
       | 
       | * If the data is mission-critical and constantly changing,
       | PostgreSQL is a rare infra thing for which I'd use a managed
       | service like AWS RDS, rather just Debian Stable EC2 or my own
       | containers. The first time I used RDS, the company couldn't
       | afford to lose an hour of data (could destroy confidence in
       | enterprise customer's pilot project), and without RDS, I didn't
       | have time/resources to be nine-9s confident that we could do PITR
       | if we ever needed to.
       | 
       | * If the data is less-critical or permits easy sufficient
       | backups, and I don't mind a 0-2 year-old stable version of PG,
       | I'd probably just use whatever PG version Debian Stable has
       | locked in. And just hold my breath during Debian security updates
       | to that version.
       | 
       | (Although I think I saw AWS has a cheaper entry level of pricing
       | for RDS now, which I'll have to look into next time I have a
       | concrete need. AWS pricing varies from no-brainers to lunacy,
       | depending on specifics, and specifics can be tweaked with costs
       | in mind.)
        
       | [deleted]
        
       | justsomehnguy wrote:
       | mkdir /var/lib/postgresql/data/old       mv -v
       | /var/lib/postgresql/data/* /var/lib/postgresql/data/old/
       | mkdir /var/lib/postgresql/data/new
       | 
       | You should create the both dirs first, then _check if they do
       | really exist_ and only then move the files. Shit happens and it
       | 's better be safe than sorry.
       | 
       | Also I would replace ../old and ../new with $OLD and $NEW to have
       | a bit less clutter in the next block with the explicit upgrade
       | calls, ie:                 /usr/local/bin/pg_upgrade --link -d
       | $OLD -D $NEW -b /usr/local-pg9.5/bin -B /usr/local/bin
       | 
       | Overall it's nice idea, just need some more safety checking
       | before starting the process.
        
         | justinclift wrote:
         | Cool, good thinking, thanks heaps. I'll implement that in the
         | morning. :)
        
       | lmeyerov wrote:
       | That's great.
       | 
       | We have been maintaining a manual migration script for our docker
       | users for the DB part, while the app part does migrations
       | automatically already (Django), so making the db part more built-
       | in makes a lot of sense.
       | 
       | Our case doesn't really need live/fully auto, just an easy mode
       | for admins during upgrade cycles, and our scripts were pretty
       | generic, so a project like this makes a lot of sense. There are a
       | few modes we support - local / same-server, cross-node, etc - so
       | am curious.
        
       ___________________________________________________________________
       (page generated 2023-07-16 23:00 UTC)