[HN Gopher] Tricking PostgreSQL into using an insane, but faster... ___________________________________________________________________ Tricking PostgreSQL into using an insane, but faster, query plan Author : cube2222 Score : 156 points Date : 2022-01-18 16:42 UTC (6 hours ago) (HTM) web link (spacelift.io) (TXT) w3m dump (spacelift.io) | SigmundA wrote: | PostgreSQL really needs hints. Sometimes I as the developer know | better than the optimizer and having to trick the optimizer to | get it to do what I want is a ridiculous situation. | | Also having the optimizer decide to do something bad off hours is | not a good situation again hints or plan locking would help here. | shock-value wrote: | It's basically shameful how the maintainers refuse to even | entertain the thought of allowing query hints. These kinds of | articles and anecdotes come up again and again, yet seemingly | no movement whatsoever on the blanket ban on query hints in | Postgres. (See | https://wiki.postgresql.org/wiki/OptimizerHintsDiscussion) | | Postgres generally is great -- so many great features and | optimizations and more added all the time. But its query | optimizer still messes up, often. It's absolutely not the | engineering marvel some would have you believe. | pmontra wrote: | I can understand that people wants to get absolute control on | their queries sometimes. I never used hints so I shouldn't | even write this note but IMHO that page has a quite balanced | view of pros and cons of existing hint systems. | zmmmmm wrote: | > that page has a quite balanced view of pros and cons of | existing hint systems | | I found the cons section somewhat disappointing given how | much respect I have for Postgresql maintainers in general. | | Most of the "problems" are essentially manifestations of | dysfunction within users or PostgreSQL development itself. | People won't report optimizer bugs if they can fix them | themselves, etc. (far more likely they won't report the | bugs if they have no way to prove their alternative query | plan actually performs better, eg: by adding a hint). | | Many are just assertions which I highly doubt would hold up | if validated ("most of the time the optimizer is actually | right", "hints in queries require massive refactoring" | ...). | gowld wrote: | > (far more likely they won't report the bugs if they | have no way to prove their alternative query plan | actually performs better, eg: by adding a hint). | | They don't need to run an alternative plan, they just | need to report that the main plan did far more work than | needed. | | In the real world, when users have a problem in their | mission-critical system, they ask for a solution, not | just give up on the mission. | | > Many are just assertions which I highly doubt would | hold up if validated ("most of the time the optimizer is | actually right", "hints in queries require massive | refactoring" ...). | | Their 20 years of experience beats your idle speculation. | zmmmmm wrote: | > Their 20 years of experience beats your idle | speculation. | | Perhaps. | | But it comes across to me on that page as arrogance and | dismissiveness of user needs. Hence why I find it | disappointing, even if it is mainly in how it is | expressed. | infogulch wrote: | > they just need to report that the main plan did far | more work than needed. | | And how do they know that the chosen plan is not already | optimal? How many reports do you think you'd get if every | time someone encountered a "slow" query they reported it? | And what fraction of those would be finally found to be | caused by a bad optimizer as opposed to the user's fault | like a missing index, outdated statistics, ... 1%? 0.1%? | See stack overflow for a sample of that ratio. Be careful | what you wish for. | | If hints existed performance reports could be "default | plan of x is slow, I know because when I use hints w,z | then it's fast". | | What should they do in the meantime while they're waiting | for the pg project to acknowledge the problem, someone to | propose a solution, and someone to implement a fix, just | sit on their hands and twiddle their thumbs while their | database server spews smoke? Hints are an escape hatch | that empowers users to take over a when the optimizer | veers off course. Not having _any_ way to control the | plan is like a tesla autopilot that doesn 't allow the | driver to take control over the car in an emergency. | | Also, distrusting users to report performance issues is a | weird attitude and gives a bad taste. Maybe it's true, or | maybe you need to make it easier. | williamdclt wrote: | It does. But when you're watching your database, and | therefore your business, crash and burn because it decided | to change the query plan for an important query, all these | "problems with existing Hint systems" sound very | irrelevant. | | I'd love to have a way to lock a plan in a temporary | "emergency measure" fashion. But of course it's | hard/impossible to design this without letting people abuse | it and pretend it's just a hint system. | silon42 wrote: | Exactly... DB deciding to switch query plans in | production is sometimes terrible. | suddendescent wrote: | >I'd love to have a way to lock a plan in a temporary | "emergency measure" fashion. | | Could this be achieved by extending prepared statements? | [1] The dirty option would be to introduce a new keyword | like PREPAREFIXED. | | The first time such a statement is executed, the | execution plan could be stored and then retrieved on | subsequent queries. There would be no hints and the | changes in code should be minimal. | | Once a query runs successfully, the users can be sure | that the execution plan won't change. | | >But of course it's hard/impossible to design this | without letting people abuse it | | Is this more important than having predictable execution | times? | | [1] https://www.postgresql.org/docs/current/sql- | prepare.html | yjftsjthsd-h wrote: | > These kinds of articles and anecdotes come up again and | again | | Don't articles and anecdotes also come up again and again of | developers feeding bad happens to the database and cratering | their performance? | | And that wiki page... Very explicitly isn't a blanket ban? | They straight up say that they're willing to consider the | idea if somebody wants to say how to do it without the | pitfalls of other systems. The only thing they say they're | going to ignore outright is people thinking they should have | a feature because other databases have it (which seems fair). | erichocean wrote: | > _people thinking they should have a feature because other | databases have it (which seems fair)_ | | Literally NO ONE wants query hinting in Postgres to check | some kind of feature box because other databases have it. | | We know we want it because...other databases have it, and | it's INCREDIBLY USEFUL. | | > _They straight up say that they 're willing to consider | the idea if somebody wants to say how to do it without the | pitfalls of other systems._ | | Pitfalls my ass. We want exactly the functionality that is | already present in other systems, _pitfalls and all_. That | 's just an excuse to do nothing, Apple-style, "because we | know better than our own users" while trying to appear | reasonable. | | It's akin to not adopting SQL until you can do so "while | avoiding the pitfalls of SQL." Just utter bullshit. | adamrt wrote: | > We want exactly the functionality that is already | present in other systems, pitfalls and all. That's just | an excuse to do nothing | | Are you offering to help with the maintenance or | development associated with it? Or are you just demanding | features while calling the people that do help with that | stuff liars? | | Maybe they do know better? Or maybe they know about other | hassles that will come with it that you can't fathom. | They've been developing and maintaining one of the best | open source projects on the planet for nearly 3 decades. | Maybe with all that experience, its not their opinion | that is BS? | mst wrote: | Given there's already a pg_hint_plan extension I really | don't see why people are so angry about the developers | not wanting to bless something they consider a footgun as | a core feature. | zepearl wrote: | "risk", because pg_hint is an external module, written by | somebody as a hobby, therefore if it stops existing or | working after some updates or with newer versions of | Postgres then it becomes a huge problem for PROD systems | that need it for any reason. | ghusbands wrote: | Sure, maybe in not doing what every other database does, | that really helps users out when they're suffering | unpredictable performance, they're demonstrating their | far superior knowledge. And maybe all the users that are | suffering are somehow incompetent. | | Or maybe it's just typical developer hubris of the kind | that hits everyone at some point. | fragmede wrote: | We're software engineers not marketing folk who just want | to check a box. | | There's a difference between "have a feature _just because_ | other databases have it " and "this is a very useful | feature that would have its own PostgreSQL-isms and also we | got this idea because other databases have it". The query | planner isn't infallible, so being able to hint queries to | not accidentally use a temporary table that just can't fit | in ram isn't just copying a feature "just because everyone | else has it". | ghusbands wrote: | > they're willing to consider the idea if somebody wants to | say how to do it without the pitfalls of other systems | | That is basically a blanket ban. Saying you won't implement | a widely-implemented feature unless someone comes up with a | whole new theory about how to do it better is saying you | simply won't implement it. Other databases do well enough | with hints, and they do help with some problems. | zorgmonkey wrote: | I've never used it but their is a postgres extension called | pg_hint_plan [0] for query hints and I am guessing it is | pretty decent because it is even mentioned in the AWS Aurora | documentation | | [0]: https://pghintplan.osdn.jp/pg_hint_plan.html | zmmmmm wrote: | Definitely my biggest issue with Postgres. | | I have one shameful query where, unable to convince it to | execute a subquery that is essentially a constant for most of | the rows outside of the hot loop of the main index scan, I | pulled it out into a temp table and had the main query select | from the temp table instead. Even creating a temp table _and | indexing it_ was faster than the plan the Postgres query | planner absolutely insisted on. Things like CTEs etc made no | difference, it would still come up with the same dumb plan | every way I expressed the query. | | The worst thing is not even being able to debug or understand | what is going on because you can't influence the query plan | to try alternative hypotheses out easily. | baq wrote: | even plan locking itself would be a welcome feature. we've had | multiple instances of plans going haywire (never traced it | down, but 99% sure it was after autovacuum analyze) impacting | prod and getting back to normal after a manual re-ANALYZE. | SigmundA wrote: | I haven't had much use for plan locking even after MS SQL | server got it, even though its a nice feature with being able | save load plans from a file etc. | | I just like good set of hints since they are part of source | code and I can put comments and see why they are there in | source history later. | jonatron wrote: | Having been in a situation where Postgres suddenly decides to | use a much slower query plan on a hot query in production, I'd | agree. | quietbritishjim wrote: | Although absolute sacrilege to many, but I would honestly | support a really explicit syntax to compose particular tables | and indices. Not even in top of regular SQL query, I mean its | own first class language.. E.g. "Take this index, restrict to | this subrange; for each row, pick a row using this other index | on the foreign key" (but it wouldn't have to be imperative like | I've phrased it). | | There are times when I already know how I want the query to | execute, and I have to iteratively fiddle with the query to | indirectly trick the planner into doing that. There's just no | excuse for that arrangement. Even if I've made a mistake and my | plan couldn't work efficiently - at least let me execute it | inefficiently so I can discover and debug that. | gfody wrote: | something like sql server's "use plan" it really comes in | handy when you need it | (https://forrestmcdaniel.com/2018/07/19/how-to-use-the-use- | pl...) | SigmundA wrote: | Yes I would too. Over time I have come to appreciate at least | being able to control order of operations, just forcing join | order is many times enough of hint to keep things working | properly. | | Having used Linq a lot I would actually prefer more of that | kind of chained statement approach that is more hands on | without having to explicitly loop. | derefr wrote: | Or, to put that another way: Postgres's query planner should | be emitting an explicit bytecode that is then interpreted by | a query-execution abstract machine. That same layer should | also be exposed directly to clients over the network, | allowing pre-planned query bytecode to be submitted directly | in place of a textual query. | | I've heart the rebuttal to this idea before, that even minor | DB schema/configuration changes that should be "invisible" to | the application (e.g. an index being created -- or even | refreshed -- on the table) would result in the resulting | bytecode being different; and so you'd never be able to ship | such bytecode as a static artifact in your client. | | But: who said anything about a static artifact? I'm more | imagining a workflow like this: | | 1. user submits a query with leading command "PLAN" (e.g. | "PLAN SELECT ...") | | 2. Postgres returns planned bytecode (as a regular single- | column bytea rowset) | | 3. if desired, the user _modifies_ the received bytecode | arbitrarily (leaving alone anything the client doesn 't | understand -- much like how optimization passes leave alone | generated intrinsics in LLVM IR) | | 4. user later submits a low-level query-execution command | over the binary wire protocol, supplying the (potentially | modified) bytecode | | 5. _if the bytecode 's assumptions are still valid_, the | query executes. Otherwise, the query gives a "must re-plan" | error. The client is expected to hold onto the original SQL | and re-supply it in this case. | | In other words, do it similarly to how Redis's EVAL + EVALSHA | works; but instead of opaque handles (SHA hashes), the client | receives a white-box internal representation. | | Or: do it similarly to how compiling a shader on a game- | console with unikernel games + unified RAM/VRAM works. The | driver's shader compiler spits you back a buffer full of some | GPU object code -- which, running in kernel-mode, you're free | to modify before running. | tshaddox wrote: | You can't ship SQL queries and then do arbitrary schema | changes either, right? | lazide wrote: | Not arbitrary changes, but nearly so - if you add new | columns (and your existing queries name the columns they | use explicitly), then nothing changes as far as the | client is concerned. If you remove or rename one, only | those clients who refer to it need to change, and then | only if you don't do a interposing view/schema/etc. | | The server is supposed to take the logical intent | specified by the queries and do the work to mapping that | into concrete retrievals in a stable and performant way, | so a query explicitly telling the server to do something | in a specific way breaks the model. The more specific, | the more broken. | | DDL changes for instance ideally would not have to tell | the server how to structure things on disk. The server | should know how to do it efficiently. | | That said, real life intrudes and sometimes (but not | usually) it's important to do this for stability or | performance reasons. The more magic involved, the more | unpredictable a system can be, and the closer a system | runs to redline, the more chaos/downtime that can cause. | jhgb wrote: | > Or, to put that another way: Postgres's query planner | should be emitting an explicit bytecode that is then | interpreted by a query-execution abstract machine. That | same layer should also be exposed directly to clients over | the network, allowing pre-planned query bytecode to be | submitted directly in place of a textual query. | | And now you've invented Firebird, or even 1980's Interbase. | ;) | mst wrote: | I suspect the extra compatibility/dependency surface would | make this a non starter in practice but it's an absolutely | fascinating thought experiment and I'm glad you wrote it | up. | SigmundA wrote: | PostgreSQL doesn't even cache its own byte code, it replans | every time! | mst wrote: | Depending on the placeholder values the optimal plan may | be very different. | | You -can- use a server side prepared statement to force | it to -ahem- plan ahead but that's -usually- not actually | worth it. | SigmundA wrote: | Thats why other databases have parameter sniffing. | | And using prepared statements only works on the same | connection so of limited use. | | Luckily PostgreSQL's optimizer is very primitive and so | planning doesn't take too much time, as it gets more | advanced the lack of plan reuse will become more of an | issue. Its already an issue with the LLVM JIT compilation | time. | orthoxerox wrote: | Clickhouse doesn't go that way all the way, but its approach | to SQL support is very similar: it does exactly what you have | written. It doesn't rewrite your query in any way: if you | want to filter table A before you join it with table B, you | write a subquery. | magicalhippo wrote: | There's no technical reason why one couldn't just select or | join from an index as if it was a table or a view is there? | | We also write queries knowing it will use a specific index, | or we will create an index because of a specific query. And | then we have to have a scheduled task to periodically | recalculate statistics just so the DB server doesn't get | silly ideas. | | Of course it could be abused, but I'm in favor of having ways | of letting programmers tell computers exactly what to do. | Sometimes we really do know best. | pzduniak wrote: | I recall using https://pghintplan.osdn.jp/pg_hint_plan.html for | this feature, it did what it claimed to be able to do. It's a | hack, but it really helps in the worst case scenarios. | remus wrote: | Not that I disagree with your point, but the opposite of | | > Also having the optimizer decide to do something bad off | hours is not a good situation again hints or plan locking would | help here. | | Can also happen when you force a particular query plan, only | for it to turn to treacle when some assumption you made about | the data suddenly breaks. | | As with most things there's no free lunch. | lazide wrote: | When doing disaster planning, at least something blowing up | because of a data change is preventable or mitigateable. | Staging environments. Rollbacks, etc. | | The system randomly deciding to drive itself off a cliff for | no reason, with no known way to stop it next time is quite | concerning. | ghusbands wrote: | Since the PostgreSQL folks are clearly against hints, maybe | another random feature suggestion: If a (read-only) query is | taking "too long", maybe you could launch a second copy with a | different plan. Maybe even a third after twice as long and a | fourth after twice as long again. Kill them all once any one | returns. Trades some extra load now against waiting forever for | a 'bad' query plan. | | The idea obviously needs work and has probably already been | suggested and dismissed, somewhere, but I thought I might throw | it out there, especially with modern computers having so many | cores. | lgessler wrote: | I'm not a DB expert, but aren't most queries bottlenecked by | disk bandwidth? If so, addtl queries might hurt more than | help. | zepearl wrote: | On one hand this could theoretically work if the optimizer | tracks historical SQLs + the query plans used at that time + | their runtime and decides to try to use a different query | plan, then aborts the SQL being executed when its runtime | exceeds the historical runtime. But in practice it's likely | to mess things up because data volume/distribution/machine | load (CPU, disks, etc) might differ compared to the past | etc... . | | On the other hand, in general there are often too many | postential combinations of query plans to try out (hundreds | even for a relatively simple SQL) and trying them all out | would need hours/days/etc... . The "good plan" might be | something that a machine might categorize as "very unlikely | to work" so it might end up being the one tested | automatically at the very end. | | Normal hints would still be a lot easier to handle in the | code and for the user. | electroly wrote: | I really appreciate that in SQL Server we can just use query | hints. I don't think I will ever see eye-to-eye with the | PostgreSQL team on this. No query planner is perfect. SQL | Server's planner certainly isn't. We have had the same sort of | "query plan suddenly changes in production and everything | breaks" incidents, but they are easily fixed with query hints | and/or forced plans, and you can be sure that future statistics | updates won't break them again. | SigmundA wrote: | I still have nightmares of SQL servers optimizer taking down | production systems at random times due to picking a bad plan, | judicious use of hints has made my life much easier and | allowed me as a developer to be in control and not have to | ask the DB nicely and pray it listens. | newlisp wrote: | I guess that is why cloud databases like spanner charge big | bucks. | fennecfoxen wrote: | I worked for a payments firm that submitted transfers in | bulk every day, but had a monthly surge day, and a late | cut-off period to submit or cancel payments. It doesn't | matter if you can do everything else in 15 minutes, if the | query planner decides to take a 30-second query and turn it | into a 4-hour nightmare because different customers are | making different numbers of payments today, and it trips | over some Postgres heuristic. | | The Postgres query planner was, quite frankly, the enemy. | By the time I left we were at the point that we considered | it a business risk and were looking at alternatives. If you | need queries to run in a predictable amount of time -- | forget _fast_ , simply _predictable_ -- then Postgres is | quite simply not fit for purpose. | slt2021 wrote: | these query issues are most of the times not SQL Server | problem, they are rather symptom of poor data model | architecture, table design. | | A lot of times I see in operational DB hot data gets | mixed with warm and cold data, and your hot path query | with ton of JOINs/subqueries will get rekt. | | Proper redesign and rearchitecture helps to provide large | buffer against these problems. | | Also not everything should be inside SQL server, if you | run large query every 5 seconds or something - probably | consider using in-memory cache or denormalized model | fennecfoxen wrote: | Excuse me. You are replying with advice that is | fundamentally inappropriate for a batch workload | (describing a "hot path query" and suggesting in-memory | caches), and it shows that you have not read and | comprehended the post I wrote. | | Besides which, if you need to re-architect and | denormalize by moving records in and out of different | tables just so you can run what is essentially a monthly | report in a predictable (not fast! just predictable) | amount of time -- well then, why bother using Postgres in | the first place, you might as well bite the bullet and go | with something NoSQL at that point, because at least when | you read from a given Cassandra partition you know all | the results will be right next to each other, 100%; why | leave it to chance? you've been burned by Postgres before | slt2021 wrote: | As a developer I never had to blame SQL server, it was | all the time develop mistake: poor architecture, poor | query, some data quality issues. Dont blame the platform | for developers mistakes, thats what I was trying to say | fennecfoxen wrote: | The wise man builds his house upon the rock. The foolish | man builds his house upon the sand. I will 100% blame the | platform when what it provides is deficient, for | selecting a deficient platform is one of the gravest | mistakes a developer can make. | | And Postgres, again, is materially deficient in its query | performance predictability, and its developers are | insistent that they have no desire to allow the | approaches that would mitigate it. If this matters to | your application, then the prudent developer will drop | Postgres and use a real database. | SigmundA wrote: | Must not have been working with it very long, off the top | of my head having to disable the "new" optimizer in Sql | 2014 and go back to the 2012 version due to bad plans | https://sqlservergeeks.com/sql-server-2014-trace- | flags-9481/ | | I have been working with it since way back in 6.5 and it | has made many obviously wrong decisions in the optimizer | over the years, its has gotten better but is by no means | perfect, good thing is it has hints unlike PG! | slt2021 wrote: | So whats the issue with marginal difference in | cardinality estimators? Both queries seem perfectly fine | speed-wise. | | Just splitting hairs, again most of the time developers | fail to notice the significant data distribution drift, | and dont rearchitect their data models then blame the | engine for developers omissions and invalid assumptions | SigmundA wrote: | https://www.sqlconsulting.com/archives/big-performance- | probl... | fragmede wrote: | Blaming the table architecture doesn't get rid of the | fact that the PostgreSQL planner can make decision that | will cause a query to take an unknown amount of time. | cube2222 wrote: | Hey, author here. | | The article describes a fun debugging and optimization session | I've had recently. Happy to answer any questions! | tqi wrote: | Interesting write up! Did you happen to try this with a CTE | instead of a subquery? ie something like: WITH | tmp AS (SELECT accounts_other.id, COUNT(*) n | FROM accounts accounts_other JOIN stacks | stacks_other ON accounts_other.id = stacks_other.account_id | JOIN runs runs_other ON stacks_other.id = runs_other.stack_id | WHERE (stacks_other.worker_pool_id IS NULL OR | stacks_other.worker_pool_id = worker_pools.id) | AND runs_other.worker_id IS NOT NULL GROUP BY | 1 ) SELECT COUNT(*) | as "count", COALESCE(MAX(EXTRACT(EPOCH FROM | age(now(), runs.created_at)))::bigint, 0) AS "max_age" | FROM runs JOIN stacks ON runs.stack_id = stacks.id | JOIN worker_pools ON worker_pools.id = stacks.worker_pool_id | JOIN accounts ON stacks.account_id = accounts.id LEFT | JOIN tmp ON tmp.id = accounts.id WHERE | worker_pools.is_public = true AND runs.type IN (1, 4) | AND runs.state = 1 AND runs.worker_id IS NULL | AND accounts.max_public_parallelism / 2 > COALESCE(tmp.n, 0) | | I don't know this for sure, but in my experience it has seemed | like Postgres is bad at optimizing subqueries, and will execute | once per row. | cube2222 wrote: | Haven't run it, but based on the query plan in the first | picture, PostgreSQL already does that part kind of right. It | precalculates it for all accounts and then does a three-way | hash join. | dragonwriter wrote: | Aren't CTEs still optimization barriers in Postgres, so | wouldn't that make any subquert optimization problem worse? | CBLT wrote: | Wasn't this solved in Postgres 12?[0] | | [0] https://www.depesz.com/2019/02/19/waiting-for- | postgresql-12-... | kristjansson wrote: | Mostly no, since PostgreSQL 12 | [deleted] | briffle wrote: | The fun thing is, they have corrected a bunch of that in | newer versions of postgresql. | | I'm surprised that the problem stemmed from an index where | the 'AND NOT NULL' took forevery to scan, and they didn't | include a partial index, or an index in a different order | since they are sorted. | srcreigh wrote: | Depends if materialization helps. | | As a silly example, if the CTE materializes 1/100th of a | table via table scan, scanning 1/100th of a table N times | may be faster than scanning the entire table N times. | aeyes wrote: | For Postgres 12+ you'll have to use to force the plan | | > WITH tmp AS MATERIALIZED | mijoharas wrote: | Hey, I think you'll probably get better performance if you swap | your subquery around a little from: SELECT | COUNT(*) FROM runs runs_other WHERE (SELECT | COUNT(*) FROM stacks WHERE stacks.id = | runs_other.stack_id AND stacks.account_id = | accounts.id AND stacks.worker_pool_id = | worker_pools.id) > 0 AND runs_other.worker_id IS NOT | NULL | | To: SELECT COUNT(*) FROM runs runs_other | WHERE EXISTS (SELECT * FROM stacks | WHERE stacks.id = runs_other.stack_id AND | stacks.account_id = accounts.id AND | stacks.worker_pool_id = worker_pools.id) AND | runs_other.worker_id IS NOT NULL | | Your previous option needs to hit every row in the stacks | subquery to generate the count. The modified one uses EXISTS so | it only needs to check there's at least one row, and then can | short-circuit and exit that subquery. (a smart enough query | planner would be able to do the same thing, but PG generally | doesn't). | cube2222 wrote: | Thanks for the tip! I didn't know EXISTS can provide such | benefits. | | However, I've tried it out and it resulted in a 10x | performance loss. Looking at the query plan, it actually got | PostgreSQL to be too smart and undo most of the optimizations | I've done in this blog post... It's still an order of | magnitude faster than the original query though! | mijoharas wrote: | I stand corrected, and always glad to see the main point | about perf optimisation proven. (always measure! :) ) | gaha wrote: | Have you tried unnesting the subquery? From the images posted | in your blog it's not clear to me how often the subquery is | run. Maybe unnesting it somehow like this should work well: | SELECT COUNT(*) as "count", | COALESCE(MAX(EXTRACT(EPOCH FROM age(now(), | runs.created_at)))::bigint, 0) AS "max_age" FROM runs | JOIN stacks ON runs.stack_id = stacks.id JOIN | worker_pools ON worker_pools.id = stacks.worker_pool_id | JOIN accounts ON stacks.account_id = accounts.id | /*unnesting*/ join (SELECT accounts_other.id, | COUNT(*) as cnt FROM accounts accounts_other | JOIN stacks stacks_other ON accounts_other.id = | stacks_other.account_id JOIN runs runs_other ON | stacks_other.id = runs_other.stack_id WHERE | accounts_other.id AND (stacks_other.worker_pool_id | IS NULL OR stacks_other.worker_pool_id = | worker_pools.id) AND runs_other.worker_id IS NOT | NULL group by accounts_other.id) as acc_cnt | on acc_cnt.id = accounts.id and | accounts.max_public_parallelism / 2 > acc_cnt.cnt | /*end*/ WHERE worker_pools.is_public = true | AND runs.type IN (1, 4) AND runs.state = 1 | AND runs.worker_id IS NULL /* maybe also copy these | filter predicates inside the unnested query above */ | | Edit: format and typo | soperj wrote: | what are the downsides? | cube2222 wrote: | The downside is that a PostgreSQL version update can change | internal optimizer rules, which may result in the plan being | changed to a less performant one again. | | In that case it's good to have monitoring in place to catch | something like that. We're using Datadog Database monitoring | which gives you a good breakdown of what the most pricey | queries are - it uses information from the pg_stat_statements | table, which contains a lot of useful stats about query | execution performance. | | You should regularly check such statistics to see if there | aren't any queries which are unexpectedly consuming a big | chunk of your database resources, as that might mean you're | missing an index, or a non-optimal query plan has been chosen | somewhere. | topspin wrote: | > The downside is that a PostgreSQL version update can | change internal optimizer rules | | This has actually happened relatively recently. | | PostgreSQL Common Table Expressions (WITH x as (some | query)) used to be optimization fences by default, meaning | that you can influence the optimizer using a CTE. This is a | well known technique among PostgreSQL users that resort to | it for lack of optimizer hints. | | In Pg12 they enhanced the optimizer to remove the | optimization fence _by default_ , so the query plans of | existing queries automatically changed and sometimes became | much slower as a result. If you want the old CTE behavior | you have to modify the WITH clause via AS MATERIALIZED. | Justsignedup wrote: | heh i had a similar style problem, i think it was more 1000x than | 200x, but its because certain query structures the optimizer | couldn't figure out, had to try like 4 different re-writes of the | sql till the optimizer could understand it. | | I guess at this point I just see it as just part of my job to do | this insane stuff, good to know that others see this as hard :) | pkrumins wrote: | Instead of doing this madness with complicated queries, I always | advise to split complicated queries into many smaller simple | queries and then merge the results in the code. This way you | don't need to debug the queries as all of them are very simple | SELECTs and the code that merges the results is one or two for | loops. | ccleve wrote: | This. Plus, you can use temporary tables, which limits the | amount of application code you have to write. Put your | intermediate results in temp tables, and build indexes on them | if you have to. It's a cheap way of forcing the optimizer to do | things your way. | cultofmetatron wrote: | in practice that means mroe data being sent to the application | code that then has to process that. 99% of the time, you'll get | more performance figuring out the right way to express the data | structure you want in sql. The entire computation happens in | the database so all the data is immediately available to | process on disk and you only send what you need for bandwidth | consumption. IF the query is slow, you need to think about how | you setup your joins and being aggressive about your use of | predicates in subqueries to reduce the size of temporary | tables. | chrisjc wrote: | Not in every case. Some DBs allow you to reference the | results and use the result reference in subsequent queries. | As a result, data stays in the DB until you're actually ready | to pull it. | | Or of course, you can just use temp tables. | cultofmetatron wrote: | thats a nice feature. does postgres have it? in that case, | its even MORE performant to push your data processing into | the database via a query | chrisjc wrote: | I'm not too sure about Postgres honestly. I was | specifically talking about Snowflake, although there | might be others that do it too. | | https://docs.snowflake.com/en/sql- | reference/functions/result... | | Sometimes you can follow this pattern and never actually | pull data into your application, which is especially true | when you're doing ETLs (table --> table). | Tostino wrote: | That can sometimes be a solution... Just keep in mind, you are | manually writing a join algorithm every time you need to do | that, and sometimes it will be the right one for the task, | other times not so much. The query planner will adapt your join | type for you instead, if you can trust it. | | As the shape of your data changes over time, your hard coded | algorithm choice is not guaranteed to be the best one. | takeda wrote: | That won't always work. Main reason for bigger queries is to | get exact data you need, so you don't have to send unnecessary | data over the wire which is slow. | cube2222 wrote: | That's more custom logic and additional code to maintain | though. | | In this case we're using the Datadog Agent with its | custom_queries directive to automatically create metrics based | on periodically evaluated SQL queries. | jsumrall wrote: | Sounds like just using CTEs. This is how I write some | complicated queries, as a few CTEs with specific logic to get | an overall query written in a more readable way. | [deleted] | igammarays wrote: | Are you happy with Aurora Serverless so far? Is it really the | "painless scaling" it's made out to be? I'm really hesitant to | use anything other than self-hosted Postgres with full control, | because I'm always afraid of running into situations like these | and not being able do what I need to do, like have an offline | analytical read replica. Also I imagine the cost is an order of | magnitude more than a self-hosted instance. | cube2222 wrote: | Not really. But it's mostly because the support for other | features is basically non-existant. It also seems like all | development is happening on Aurora Serverless v2 (which I think | is in preview state?). The scaling also isn't seamless, it | breaks your connections if you're using prepared statements - | we have it pinned to a specific number of capacity units. | | We'll probably move to plain Aurora at some point, as that | supports all bells and whistles, but for now Aurora Serverless | is good enough. | no-s wrote: | It's impressive how many developer misconceptions I see cropping | up in these articles. The point of the query planner is to | replace an army of programmers, not evade design faults. Using a | server RDBMS it's trivially easy to decouple schema and query | evolution from application. Yet somehow developers always seem to | back themselves into situations requiring extreme coupling. Using | plan guides in the MSSQL environment helps somewhat in patchwork | solutions, tho typically the deficiencies are usually related to | obsolete statistics...(and that's not even necessary). In | Postgres you should ensure design separation so you can rewrite | the query easily. | newlisp wrote: | _Using a server RDBMS it's trivially easy to decouple schema | and query evolution from application._ | | Curious, how do you decouple query evolution from the DB layer | of your application? | dragonwriter wrote: | > Curious, how do you decouple query evolution from the DB | layer of your application? | | Use application-specific views, and the evolution driven by | DB optimization, etc., can happen in view definitions without | the app DB layer being involved. It's a traditional best | practice anyway, going back decades to the time when it was | expected an RDBMS would serve multiple apps that you needed | to keep isolated from each others changes, but it also works | to isolate concerns at different levels for a single-app | stack. | | (Because of practical limits of views in some DBs, especially | in the 1990s, and developer preferences for procedural code, | a common enterprise alternative substitutes stored procs for | views, to similar effect.) | no-s wrote: | hah, exactly. "application-specific views...Going back | decades", or "substitute Stored Procs for views". | | It's not really much extra overhead to use a schema+views. | I ask developers to avoid creating views with "Select *", | always specify the columns desired. | | Further decoupling is possibly beneficial, e.g. CQRS is | wonderful perspective for designing data models that may be | more easily distributed or cacheable (by explicitly | choosing to separate query schema from modification | schema). | newlisp wrote: | Oh yes views, they are not with no effort and become | numerous and specific. But are extremely missed with a | document store I'm working on write now ;) | Mathnerd314 wrote: | Similar article that instead solved it by recording more | statistics: https://build.affinity.co/how-we-used-postgres- | extended-stat... | AaronFriel wrote: | Regarding this: | | > Only a minuscule part of the Runs in the database are active at | any given time. Most Stacks stay dormant most of the time. | Whenever a user makes a commit, only the affected Stacks will | execute Runs. There will obviously be Stacks that are constantly | in use, but most won't. Moreover, intensive users will usually | use private Worker Pools. The public shared Worker Pool is | predominantly utilized by smaller users, so that's another reason | for not seeing many Runs here. | | > Right now, we're iterating over all existing Stacks attached to | the public worker pool and getting relevant Runs for each. | | Is there an opportunity to make this query more robust with a bit | of denormalization, if necessary, to indicate which runs and | which stacks are active and a filtered index (index with a WHERE | clause) on each to allow efficiently retrieving the list of | active entries? | | Best practices for database optimization tell you "don't index on | low cardinality columns" (columns with few unique values), but | the real advice should be "ensure each valid key of an index has | a low cardinality relative to the size of the data" (it's OK to | have an index that selects 1% of your data!). | | That is, conventional wisdom says that indexing on a boolean is | bad, because in the best case scenario 50% of your entries are | behind each key, and in the worst case almost all entries are | behind either "true" or "false". There are performance impacts | from that and there's the risk the query optimizer tries to use | the "bad" side of the index, which would be worse than a full | table scan. | | But with the improved advice and insight that 0.01% of rows have | value "true", you can create an index with a `WHERE active = | true` clause, et voila: you have an index that finds you exactly | what you want, without the cost and performance issues of | maintaining the "false" side of the index. | cube2222 wrote: | Yep, you could indeed use a partial index with the two relevant | conditions to solve this query well. We're using those all over | the place as well. | deltaonefour wrote: | This article is everything that's wrong with SQL. | | How about just providing us access to the query plan and allowing | users to directly modify it? It's like needing assembly language | to optimize a really tight section of code but having that access | blocked off. | | Why do we have to rely on leakage to a higher level interface in | order to manipulate the lower level details? Just give us manual | access when we need it. | shock-value wrote: | I think SQL is generally great but I do agree with you. Access | to lower level primitives from which one could construct their | own query plan would be awesome. Query hints (not available in | Postgres) would also probably get you a lot of the way there. | hnov wrote: | In this case denormalizing the data right into the runs table | would likely give you better, but more importantly more | predictable performance. | srcreigh wrote: | The curious part is why postgres can't estimate that the size of | the active runs<>stacks join will be very small. | | A core part of query planning is estimating the best join order | using a dynamic programming algorithm. So roughly, A(1M) <> B(1M) | <> C(10) should use the join order A-(B-C), not (A-B)-C. | | I bet it's something like, postgres doesn't know the correlation | between runs_other.worker_id and runs_other.stack_id. It seems | like its seeing low number of runs_other.worker_id, then | _estimating the stacks are split evenly among those small number | of runs_ in the millions. | | (Why it wouldn't know stacks.id correlates to one stack each, | idk. Is stacks.id not the primary key? Is there a foreign key | from runs? Very curious.) | | What happens if you follow this guide to hint Postgres as to | which columns are statistically related? [0] | | Hinting PG's estimating stats may be a more persistent solution. | Another core part of query planning is query rewriting-if they | add a rule to simplify COUNT(select...)>0 to normal joins (which | are equivalent I think?) then your trick may stop working. | | [0]: https://www.postgresql.org/docs/12/multivariate- | statistics-e... | cube2222 wrote: | To be fair, we're using Aurora Serverless, so I'll have to dig | deeper if we can use these kinds of hints. But thanks for the | tip, this could be really useful! | mst wrote: | Given Aurora talks about itself as a replacement for pg/mysql | that provides compatible skims I'm honestly wondering whether | you're dealing with the pg optimiser here in any meaningful | way. | | (exactly how far along the scale from 'tweaked version' to | 'basically just a skin on a completely new database' aurora | is is ... opaque to me ... there's probably something out | there that answers that but I haven't yet managed to find it) | | Edited to add: zorgmonkey downthread points out they have | support for some extensions like pg_hint_plan - https://docs. | aws.amazon.com/AmazonRDS/latest/AuroraUserGuide... - which | suggests at least a decent amount of pg code but I'm still | curious exactly where the changes start. | cube2222 wrote: | I obviously don't know its exact internal architecture, but | based on my digging and trying out different PostgreSQL- | native features, it looks like it's PostgreSQL instances as | query executors with a custom (high-availability) storage | layer, fronted by PgBouncer. | slt2021 wrote: | There is no need to trick query optimizer, just common sense and | understanding of rdbms engine. | | The query seems pretty standard and often occuring problem with | nested queries. Every JOIN and nested query grows computation due | to cartesian combination, so having your predicates as narrow and | specific as possible is a must. | | Also Nested Loop is a kiss of death for your query performance ___________________________________________________________________ (page generated 2022-01-18 23:00 UTC)