[HN Gopher] Zero-downtime schema migrations in Postgres using views
       ___________________________________________________________________
        
       Zero-downtime schema migrations in Postgres using views
        
       Author : fabianlindfors
       Score  : 82 points
       Date   : 2021-06-16 18:39 UTC (4 hours ago)
        
 (HTM) web link (fabianlindfors.se)
 (TXT) w3m dump (fabianlindfors.se)
        
       | nartz wrote:
       | Very interesting. I like the idea of "virtualizing" the hard
       | schema with views to abstract away some things.
       | 
       | View performance can be a thing at larger scale for OLTP
       | workloads, also, the solution you propose also adds complexity
       | since you have two schemas now instead of one, and as you rightly
       | point out, complexity with views themselves. The question becomes
       | when is this added complexity worth it?
        
         | johbjo wrote:
         | Views are equivalent to "query templates" unless for example
         | they do operations on subquery aggregate columns (for example
         | group by sum().)
         | 
         | So views don't add complexity relative to giving the query
         | explicitly. It might even help since the incoming queries can
         | be smaller, thus less parsing/processing.
        
         | fabianlindfors wrote:
         | I think you hit the nail on the head regarding complexity. I'm
         | experimenting with creating a migration tool which handles all
         | the views for you. I believe some of the process can be
         | automated and the complexity reduced but remains to be seen how
         | well it works!
        
       | ThrustVectoring wrote:
       | I looked into this recently, and views cover like 98% of the
       | functionality that the client app needs from Postgres. One issue
       | I ran into was that Postgres forgets that the primary key is a
       | primary key when pulling from a view, which breaks some queries
       | that rely on grouping by the primary key.
       | 
       | https://dba.stackexchange.com/questions/195104/postgres-grou...
       | has some more info on this
        
         | e12e wrote:
         | Doesn't DISTINCT work in this case?                 SELECT
         | DISTINCT ON (vt.id)       row_to_json(vt.*) FROM vt       JOIN
         | vy ON vt.id = vy.tid       WHERE vt.id = 1       AND vy.amount
         | > 1;
        
           | ThrustVectoring wrote:
           | The SQL in question which was problematic for me (tables
           | renamed):
           | 
           | SELECT posts.*, MAX(COALESCE(comments.created_at,
           | posts.created_at)) AS latest_timestamp FROM posts LEFT OUTER
           | JOINS comments ON posts.id = comments.post_id GROUP BY
           | posts.id ORDER BY latest_timestamp desc
           | 
           | In short, this is sorting posts by the most recent comment,
           | with a fallback to the post date if the post has no comments
           | on it. Hard to get rid of the grouping here and get the same
           | data back.
        
             | e12e wrote:
             | I see, that makes more sense.
        
         | fabianlindfors wrote:
         | Interesting link, thanks! It would be really nice if Postgres
         | were to close that gap and make them fully equivalent (if that
         | is even possible).
        
         | munk-a wrote:
         | I recently transitioned a number of tables over to views as
         | part of a data model rearrangement and I absolutely loathed my
         | previous self that leveraged that primary key trick. I don't
         | think there is anything unsafe about them choosing to
         | transition to allowing any column singularly defined as a
         | unique index for the table to serve this role and it'd help
         | make things a fair bit more logical.
         | 
         | That all said, until that happens, I'd strongly suggest
         | avoiding that functionality since it can lay down some real
         | landmines.
        
       | [deleted]
        
       | julian37 wrote:
       | This might work to some extent for renaming things but doesn't
       | for any other kind of migration I can think of:
       | 
       | - Dropping a column doesn't work (assuming the point of dropping
       | it is actual deletion, rather than just hiding it)
       | 
       | - Adding a column doesn't work either
       | 
       | - Changing a column's nullability, default value, or data type
       | doesn't work
       | 
       | - Doesn't help with adding or changing constraints or indexes
        
         | fabianlindfors wrote:
         | With this model, migrations would be two-stepped. First the
         | migration would be applied, creating a new schema and adding
         | the new information to the underlying table. Once the version
         | is old and no longer used by any clients, the migration would
         | be "deleted", removing the schema and the underlying
         | information.
         | 
         | So when dropping a column it would go like this: 1. Migration
         | is applied. Columns is removed from the new view but remains in
         | the backing table. 2. Migration is no longer needed. Column is
         | removed from backing table.
         | 
         | This design should be extendable to cover changing data type
         | (introduce a new column and moves values over + sync values
         | using trigger), changing default or changing the contents of a
         | full column.
         | 
         | Constraints or indices are a bit trickier, but it might work to
         | create a new column duplicated from the old one and add
         | constraints to that instead.
        
         | kroolik wrote:
         | Adding a column, changing column's nullability and
         | adding/changing constraints is already zero-downtime in PG.
         | 
         | Not sure about default value change, but I would also say its
         | zero-downtime as adding a column with default is zero-downtime.
         | Haven't checked, though.
         | 
         | Dropping a column and changing the data type are not zero-
         | downtime.
         | 
         | Renaming a column is not zero-downtime as you might have
         | multiple readers/writes using both.
        
           | mikeklaas wrote:
           | Adding a NULL constraint isn't zero downtime since PG scans
           | the whole table ensuring that the constraint is not violated.
        
       | dariusj18 wrote:
       | I don't remember which but there was a DB server I worked with
       | where the tables weren't directly accessible and required alias's
       | to be created, thus enforcing portability.
        
       | fabianlindfors wrote:
       | Author here, thanks for reading. As has been mentioned in some
       | comments, the article only covers half the story of how this can
       | be used for migrations. The greater idea is to have every commit
       | be short-lived and two-phased: apply and cleanup. A migration is
       | first applied and once it's no longer used by any clients, it's
       | cleaned up.
       | 
       | 1. Apply: the new schema is created and any new data is added to
       | the backing table.
       | 
       | 2. Cleanup: the schema is deleted and any data that is no longer
       | needed is removed from the backing table.
       | 
       | For example, to perform a column deletion:
       | 
       | 1. Apply: create new schema without the column which is being
       | deleted. Don't change anything in the backing table.
       | 
       | 2. Cleanup: delete the schema and the column from the backing
       | table.
        
       | brycelarkin wrote:
       | I think views are great way of abstracting out business logic.
       | The one thing keeping me from using it more frequently is that it
       | doesn't work well with Row Level Security.
        
         | xemoka wrote:
         | I was having issues with this as well; it seemed my problem was
         | that I was creating the view as a superuser. Since view
         | creation is essentially WITH SECURITY DEFINER (to take a term
         | from UDFs) any user accessing the view to see the underlying
         | rls enabled table would see all the results (since `postgres`
         | user bypasses all RLS).
         | 
         | Creating a separate "data owner" and "api owner" to own the api
         | and data schema+tables respectively alleviates this problem.
         | Your API schema owner != your table owner unless you `ALTER
         | TABLE FORCE ROW LEVEL SECURITY` in addition to just enabling
         | it.
         | 
         | What issues are you running into with RLS?
        
       | kroolik wrote:
       | Interesting approach, with a pretty dark side-effect: the tech
       | debt silently creeping in and saying forever. Apart from that,
       | you can use any abstraction over the base schema, depending on
       | the use-case. Renaming a column can be as simple as renaming a
       | field in the DTO structure.
       | 
       | The interesting part is when you actually get to pay the debt
       | back. Yes, this is not free lunch. You move fast, but you have to
       | eventually pay the cost of the underlying storage containing all
       | the historical schemas combined at once.
       | 
       | This is the difference between carefully planned schema
       | migrations and abstraction-based migrations. The former takes
       | more time planning and executing, but the end state is the
       | reference schema. The latter keeps on extending the underlying
       | schema and building projections over it. Over time, the system
       | accumulates tech debt in the form of ever-expanding base schema
       | that contains all the historical columns.
        
         | johbjo wrote:
         | The point here is to do migrations live while clients randomly
         | connect and stay connected for random time. When all old
         | clients have disconnected, the old schema can be removed.
         | 
         | The point is not to let old clients work indefinitely.
         | 
         | In other versions of this idea, all views/procedures are
         | prefixed with some revision-number but that looks messy.
        
           | kroolik wrote:
           | But what happens when schema Xn adds a column N, and schema
           | Xn+1 removes it? The base schema, the source for all the
           | views will contain all the columns Xn... from all the
           | migrations until purged.
           | 
           | That purging is what has a lot of operational complexity. And
           | renaming a column. All the rest is zero-downtime in PG
           | already.
        
             | johbjo wrote:
             | It would have to be in steps:
             | 
             | Add Xn+1: create new views without column. When no Xn
             | clients remain: drop column from tables and drop schema Xn.
             | 
             | The point as I see it is to not break live client
             | connections which expect the column to exist.
        
               | kroolik wrote:
               | Yes, this is a very important insight! I think the author
               | proposes this approach as replacement for regular schema
               | migrations, whilst both approaches complement one
               | another.
               | 
               | You can use views to make migrations that were previously
               | tricky zero-downtime.
               | 
               | If that's not the case, then I mist've read the article
               | wrong!
               | 
               | Edit: although when I think of it - if you want to
               | eventually materialize old migration schemas into the
               | base schema, you need to do the rename, too. Which is not
               | zero-downtime because of new migration schemas that do
               | the renaming automatically. Meaning changing views'
               | definition, meaning lots of locking.
               | 
               | So, you still need maintenance windows to merge all the
               | changes. Just not on every change. Otherwise the base
               | schema will then eventually be completely out of sync and
               | contain tons of old, unused columns.
        
               | fabianlindfors wrote:
               | You're right that the article doesn't mention how this
               | can actually be practically used for migrations, but that
               | is exactly what I had in mind. Maybe I'll write another
               | one some day about the two-phase migrations combined with
               | views.
        
               | kroolik wrote:
               | Looking forward to your future blog posts! The rename
               | thing got me really interested.
        
               | fabianlindfors wrote:
               | This is precisely the way I was imagining! I'm
               | experimenting with creating a migration tool which will
               | do this fully automatically.
        
             | zozbot234 wrote:
             | Views can perform arbitrary queries. They can achieve _any_
             | kind of inference starting from the base data that 's
             | stored in tables. (Keep in mind that modern SQL is fully
             | Turing-complete.) This makes any view-augmented database a
             | viable source of rich, complex knowledge, not just mere
             | information.
        
               | kroolik wrote:
               | But I doubt unleashing the full expressive power of SQL
               | is the point here. It would easily turn a moderately
               | complicated "remove column" migration a real maintenance
               | hell.
               | 
               | In the simplest form, when you add a column to some
               | schema, it should be materialized in the base schema and
               | exposed via the migration view.
               | 
               | The problems start when you add and remove 20 columns
               | because even though they are no longer visible in
               | migration schemas, they take up space in the base schema
        
               | lallysingh wrote:
               | That sounds like problems upstream of this technique. Are
               | you complaining that it isn't a silver bullet or that
               | it's so powerful that it'll be abused?
        
               | kroolik wrote:
               | @lallysingh, sorry, can't reply directly.
               | 
               | All I'm saying is that I don't think we need a full
               | Turing-complete cannonball to hit the (relatively) small
               | fly of no-downtime migrations.
               | 
               | Is it a silver-bullet? It's Turing-complete so high
               | chances yes. But for me it has a high risk of causing a
               | silver-poisoning.
               | 
               | Personally, I would stick with simpler solutions.
        
               | squiggleblaz wrote:
               | Butting in to say: Maximum reply depth is relative to the
               | view. Just click on the comment's time and you will get a
               | form with a reply field.
        
               | fabianlindfors wrote:
               | Yes, this approach might actually decrease the power of
               | your database as some things become much trickier. It
               | might be worth trading of some expressive power of the
               | database for smoother, less hands-on migrations though,
               | which is what I'm going for.
        
         | fabianlindfors wrote:
         | This is definitely a valid concern. My plan for building
         | migrations on top of this is to have older migrations
         | automatically be deleted as they become unused. Preferably you
         | wouldn't actually keep using the old schemas but rather keep
         | them as you roll out changes to your application servers.
         | Migrations could then be deleted (in reverse order) based on
         | some condition, for example time since the schema was created
         | or time since last query.
        
           | kroolik wrote:
           | But in order to support the "add column" migration you have
           | to add it to the base schema, right? It has to be
           | materialized somewhere.
           | 
           | Now, when you remove a column, you just create a new view
           | without it being available. But the column in the base schema
           | must stay because: you support the old schema, and because
           | you want to avoid the operational complexity of physically
           | removing it, right?
           | 
           | Now, you can obviously split a table into a list of tables
           | each storing the pk and some column value. But you have just
           | invented columnar storage in a row-based rdbms :D
           | 
           | A different approach is to have scheduled maintenance window,
           | for example quarterly or every 6 month when you materialize
           | all the changes at once. The amortized operational cost is
           | way lower :)
        
       | luhn wrote:
       | I thought this wouldn't work with INSERTs and UPDATEs, but it
       | turns out newer versions of Postgres automatically support
       | updating simple views. [1]
       | 
       | [1] https://www.postgresql.org/docs/12/sql-createview.html (See
       | "Updatable Views")
        
         | munk-a wrote:
         | The information on this is, admittedly, hard to dig out of the
         | primary docs but there also exists an INSTEAD OF trigger mode -
         | so you can CREATE TRIGGER "actually_save_stuff" INSTEAD OF
         | INSERT which will intercept an attempted insert on the table
         | and write the rows elsewhere.
         | 
         | Generally I've found these useful in the short term for letting
         | application logic continue to work blissfully ignorant to the
         | fact that the "table" it is writing to has been moved, merged
         | or refactored into a set of tables that the trigger will bridge
         | logic over to. While in the long term I've found that the
         | automatic logic covers the most common case, a view that's
         | filtering out some rows for some reason (permissions - soft
         | deletion) before presenting it to the application logic.
        
       ___________________________________________________________________
       (page generated 2021-06-16 23:00 UTC)