[HN Gopher] How not to blow up the production database
       ___________________________________________________________________
        
       How not to blow up the production database
        
       Author : pyrolistical
       Score  : 43 points
       Date   : 2021-10-15 18:23 UTC (4 hours ago)
        
 (HTM) web link (blog.battlefy.com)
 (TXT) w3m dump (blog.battlefy.com)
        
       | tnorgaard wrote:
       | Answer: Materalized Views.
       | 
       | On a unrelated note: Still hoping for those automatically
       | refreshed materalized views in PostgreSQL, ala what VoltDB has.
        
         | marcosdumay wrote:
         | Given how well they work on any non-specialised DBMS, I prefer
         | Postgres to take their time and do it right (AKA, differently
         | from everybody else).
        
         | aledalgrande wrote:
         | Another thing I'm waiting for in Postgres is lifting and
         | decoupling from the connection limit...
        
         | sojournerc wrote:
         | TimescaleDB (psql extension) has these, specific to time-series
         | however.
         | 
         | https://docs.timescale.com/timescaledb/latest/how-to-guides/...
        
         | BikiniPrince wrote:
         | Yes, views are nice, but there is also a fair concept of not
         | needlessly bogging down a table. Sure, they were making up
         | data, but a flat table with stats, profile data and other
         | easily external data is just bloat. Once you have an id then
         | static fields can be retrieved from other services/data stores.
        
           | tomrod wrote:
           | I'm not sure I am following. Aren't materialized views just
           | formal, cached results of a query? That wouldn't bog down a
           | table.
        
             | lazide wrote:
             | I think their point is more 'don't store all that junk in
             | your primary database and then do all your work on it there
             | too if you can just stuff it somewhere else'. Which has
             | pros and cons and depends a lot on various scaling factors.
        
             | manquer wrote:
             | Perhaps he means it will bog down on refresh.
        
               | tomrod wrote:
               | Maybe? Not sure.
        
         | ericHosick wrote:
         | > Still hoping for those automatically refreshed materialized
         | views in PostgreSQL, ala what VoltDB has.
         | 
         | Not exactly what you're hoping for and you probably already
         | follow this pattern. pg_cron can help (and is now available in
         | AWS RDS).
         | 
         | ```sql CREATE EXTENSION IF NOT EXISTS pg_cron;
         | 
         | CREATE MATERIALIZED VIEW IF NOT EXISTS
         | activeschema.some_thing_cached AS ...;
         | 
         | SELECT cron.schedule('some_thing_cached', ' _/ 5 _ * * *',
         | $CRON$ REFRESH MATERIALIZED VIEW some_thing_cached; $CRON$ );
         | ```
        
           | tibiapejagala wrote:
           | I think that the problem is when you have a materialized view
           | which takes hours to refresh. We are lucky that 99% of our
           | traffic is during 7-19 on weekdays, so we can just refresh at
           | night, but that won't work for others.
           | 
           | I don't know much about how postgresql works internally, so I
           | just probably don't understand the constraints. Anyway as I
           | understand, there are two ways to refresh. You either refresh
           | a view concurrently or not.
           | 
           | If not, then postgres rebuilds the view from its definition
           | on the side and at the end some internal structures are
           | switched from the old to the new query result. Seems
           | reasonable, but for some reason, which I don't understand due
           | to my limited knowledge, an exclusive access lock is held for
           | the entire duration of the refresh and all read queries are
           | blocked, what doesn't work for us.
           | 
           | If you refresh concurrently, postgres rebuilds the view from
           | its definition and compares the old and the new query result
           | with a full outer join to compute a diff. The diff is then
           | applied to the old data (like regular table
           | INSERT/UPDATE/DELETE I assume), so I think you get away with
           | just an exclusive lock and read access still works. There are
           | two downsides to this, first that it requires a UNIQUE
           | constraint for the join, second that the full outer join is a
           | lot of additional work.
           | 
           | I never had the time to test Materialize, but it seems to do
           | what I want with its continuous refresh.
           | 
           | I also thought about splitting the materialized view into
           | two, one for rarely changing data and another one for smaller
           | part of the data which changes daily. Then I would only have
           | to refresh the smaller view and UNION ALL both materialized
           | views in a regular view. Not sure how well will that work
           | with postgres query planner.
        
             | nkozyra wrote:
             | If it's a one shot data compilation, you could use
             | something like postgres' NOTIFY to trigger a listening
             | external app.
        
         | kccqzy wrote:
         | I've been looking at Materielize for a while
         | (https://materialize.com/). It can handle automatically
         | refreshed materialized views. Last time I checked, it didn't
         | support some Postgres SQL constructs that I use often, but I'm
         | really looking forward to it.
        
       ___________________________________________________________________
       (page generated 2021-10-15 23:00 UTC)