[HN Gopher] PostgreSQL views and materialized views and how they... ___________________________________________________________________ PostgreSQL views and materialized views and how they influenced TimescaleDB Author : od0 Score : 127 points Date : 2022-07-14 15:52 UTC (7 hours ago) (HTM) web link (www.timescale.com) (TXT) w3m dump (www.timescale.com) | jbylund wrote: | Did you consider a type of continuous aggregate that works over | pre-aggregated partitions for time buckets without modifications | and swapping those buckets in the continuous aggregate view with | a live view when data is modified? I guess it would mean that | performance would sort of continually degrade as | inserts/updates/deletes happened in more and more time buckets, | but as soon as the aggregation event fired performance would | recover. It seems like one could provide a more strongly | consistent view of aggregates with this sort of approach. | djk447 wrote: | NB: Post author | | I'm not 100% sure I understand what you're asking, but | essentially something that would look for data modifications | and at query time run the query over the older regions as well? | | If that's what you're asking the answer is yes, we did consider | it, but basically decided that it was something that relatively | few people needed and the complexity and performance tradeoffs | were unlikely to be worth it for most folks. | | Essentially, we could do something like this now by looking at | our invalidation log and running a join against it to get to a | more strongly consistent state (I haven't thought through the | full implications and whether it's truly strong consistency, I | think it might be, but it'd require a proof / some thinking | through of all of our locking logic to really get there). It's | interesting to consider though. | gfody wrote: | > If I'd written out the query, I might have seen that I didn't | need the JOIN (or never written it in the first place). Whereas | the view hides that complexity. So they can make things easier, | but that can lead to performance pitfalls if we're not careful. | | you can avoid this particular pitfall by using left joins for | views like this (that join stuff in for convenience that you | might not select) - postgres will eliminate a left join but not | an inner join since the inner join could filter rows (eg symbols | in stocks_real_time that don't exist in company (commercial | engines will use the presence of fk constraints to know that's | impossible and go ahead and eliminate inner joins as well)) | gopalv wrote: | > commercial engines will use the presence of fk constraints | | What, postgres doesn't do FK join removals? | | Like I tried it right now and it didn't remove the hash-join | | http://sqlfiddle.com/#!17/073747/2 | | Should've been just a count off the PK Employees table. | | Apache Calcite in Hive had to bake in this specific | optimization because there were a ton of deep join views | (coming off Teradata) where someone would run something like a | "select count(*) + where" or just project 2-3 columns out of a | 60+ join view from a couple of tables. | | And those just ran forever without the PK-FK removals. | michael1999 wrote: | How would that work? Your schema allows an employee to be | assigned to multiple departments, and the query must count | them. | doctor_eval wrote: | The schema only allows one employee per department, so | there is no need to look up the employee table. | jmalicki wrote: | Because of the foreign key, the query should reduce to | "explain select count(*) from Department d;" | [deleted] | djk447 wrote: | NB: post author here! | | Thanks yes! Totally true, was thinking about including some of | that but it felt like it opened a can of worms about join types | and why certain things would be included and others not (ie | inner join needs to see that it's there on both sides whereas | the left join doesn't) etc. and the post was already kinda long | in the tooth. | garyclarke27 wrote: | Postgres is an amazing database. It's only significant weakness | now is in Materialized views, with their lack of incremental | refresh. Was disappointing to see there was no progress towards | this in v15. | gunnarmorling wrote: | That work towards incrementally updated views is happening and | progressing. For now, it's a separate extension, though: | https://github.com/sraoss/pg_ivm. | edmundsauto wrote: | I also wish this were in core posgres, but if you use a build | tool like DBT you can enable this type of thing. | snicker7 wrote: | Interestingly, DBT does not support creating materialized | views. | mritchie712 wrote: | yeah, this issue has been open for years. We need them for | Clickhouse for our product. | | https://github.com/dbt-labs/dbt-core/issues/1162 | WXLCKNO wrote: | Enjoyed this post! Djk447 would love to ask a question as well. | | We've started working with Timescale to process historical time | series data. However there is so much of it that we chose which | parts we process. | | It's possible that in the future we may need to go back and | reprocess the dataset to add something we decided we want after | all. | | In your post it seems like this is handled automatically on a | smaller time scale, meaning you could insert into the past | without doing anything special. | | What happens if you need to insert data at numerous points across | two years worth of data instead? Do you have to use backfilling | as described in your documentation? Or is it better to maybe | rebuild the entire hypertable? | djk447 wrote: | NB: Post author | | Yes. this is generally handled automatically, there may be | times though where you want to essentially pause refreshing the | view for a while while you do some backfilling and then | eventually let it catch up, especially if you're overwriting | the same time period multiple times in a row. If you can insert | in time order then it just breaks up re-calculation into | smaller segments, which can be quite useful rather than having | to process the whole data set again. | | This can be a little bit different if you're doing compression, | but with continuous aggregates I think it should work fine. I'm | not 100% sure that was what you were looking for, let me know | if it's not. | hodgesrm wrote: | This is an excellent article. I like the way the author builds up | in steps to eventual consistency between source tables and their | materialized views. It was fun to guess the next step. | | I do have one question: how does the algorithm described in the | article work when the source table is spread across multiple | servers, say in multiple shards? Can TimescaleDB maintain | materialized views on each shard and then run a query that | reconciles them? | | Edited: clarification | djk447 wrote: | NB: Post author | | So we thought about doing something like that with multinode | where each of the nodes would maintain their own | materialization but abandoned it for that very reason it's | very, very difficult to maintain any sort of consistency | guarantees in that case, or even to reason about it. | | Instead we use the access nodes as coordinators to do the | materialization. right now the materialization only exists on | the access node but there's no reason we couldn't send it back | out to the data nodes, you just need a coordination point to | start a distributed transaction to have some semblance of a | guarantee. | djk447 wrote: | And glad you liked the article! | gigatexal wrote: | I wonder how well these perform compared to the aggregate tables | of Clickhouse fame. ___________________________________________________________________ (page generated 2022-07-14 23:00 UTC)