[HN Gopher] Tricking PostgreSQL into using an insane, but faster...
       ___________________________________________________________________
        
       Tricking PostgreSQL into using an insane, but faster,  query plan
        
       Author : cube2222
       Score  : 156 points
       Date   : 2022-01-18 16:42 UTC (6 hours ago)
        
 (HTM) web link (spacelift.io)
 (TXT) w3m dump (spacelift.io)
        
       | SigmundA wrote:
       | PostgreSQL really needs hints. Sometimes I as the developer know
       | better than the optimizer and having to trick the optimizer to
       | get it to do what I want is a ridiculous situation.
       | 
       | Also having the optimizer decide to do something bad off hours is
       | not a good situation again hints or plan locking would help here.
        
         | shock-value wrote:
         | It's basically shameful how the maintainers refuse to even
         | entertain the thought of allowing query hints. These kinds of
         | articles and anecdotes come up again and again, yet seemingly
         | no movement whatsoever on the blanket ban on query hints in
         | Postgres. (See
         | https://wiki.postgresql.org/wiki/OptimizerHintsDiscussion)
         | 
         | Postgres generally is great -- so many great features and
         | optimizations and more added all the time. But its query
         | optimizer still messes up, often. It's absolutely not the
         | engineering marvel some would have you believe.
        
           | pmontra wrote:
           | I can understand that people wants to get absolute control on
           | their queries sometimes. I never used hints so I shouldn't
           | even write this note but IMHO that page has a quite balanced
           | view of pros and cons of existing hint systems.
        
             | zmmmmm wrote:
             | > that page has a quite balanced view of pros and cons of
             | existing hint systems
             | 
             | I found the cons section somewhat disappointing given how
             | much respect I have for Postgresql maintainers in general.
             | 
             | Most of the "problems" are essentially manifestations of
             | dysfunction within users or PostgreSQL development itself.
             | People won't report optimizer bugs if they can fix them
             | themselves, etc. (far more likely they won't report the
             | bugs if they have no way to prove their alternative query
             | plan actually performs better, eg: by adding a hint).
             | 
             | Many are just assertions which I highly doubt would hold up
             | if validated ("most of the time the optimizer is actually
             | right", "hints in queries require massive refactoring"
             | ...).
        
               | gowld wrote:
               | > (far more likely they won't report the bugs if they
               | have no way to prove their alternative query plan
               | actually performs better, eg: by adding a hint).
               | 
               | They don't need to run an alternative plan, they just
               | need to report that the main plan did far more work than
               | needed.
               | 
               | In the real world, when users have a problem in their
               | mission-critical system, they ask for a solution, not
               | just give up on the mission.
               | 
               | > Many are just assertions which I highly doubt would
               | hold up if validated ("most of the time the optimizer is
               | actually right", "hints in queries require massive
               | refactoring" ...).
               | 
               | Their 20 years of experience beats your idle speculation.
        
               | zmmmmm wrote:
               | > Their 20 years of experience beats your idle
               | speculation.
               | 
               | Perhaps.
               | 
               | But it comes across to me on that page as arrogance and
               | dismissiveness of user needs. Hence why I find it
               | disappointing, even if it is mainly in how it is
               | expressed.
        
               | infogulch wrote:
               | > they just need to report that the main plan did far
               | more work than needed.
               | 
               | And how do they know that the chosen plan is not already
               | optimal? How many reports do you think you'd get if every
               | time someone encountered a "slow" query they reported it?
               | And what fraction of those would be finally found to be
               | caused by a bad optimizer as opposed to the user's fault
               | like a missing index, outdated statistics, ... 1%? 0.1%?
               | See stack overflow for a sample of that ratio. Be careful
               | what you wish for.
               | 
               | If hints existed performance reports could be "default
               | plan of x is slow, I know because when I use hints w,z
               | then it's fast".
               | 
               | What should they do in the meantime while they're waiting
               | for the pg project to acknowledge the problem, someone to
               | propose a solution, and someone to implement a fix, just
               | sit on their hands and twiddle their thumbs while their
               | database server spews smoke? Hints are an escape hatch
               | that empowers users to take over a when the optimizer
               | veers off course. Not having _any_ way to control the
               | plan is like a tesla autopilot that doesn 't allow the
               | driver to take control over the car in an emergency.
               | 
               | Also, distrusting users to report performance issues is a
               | weird attitude and gives a bad taste. Maybe it's true, or
               | maybe you need to make it easier.
        
             | williamdclt wrote:
             | It does. But when you're watching your database, and
             | therefore your business, crash and burn because it decided
             | to change the query plan for an important query, all these
             | "problems with existing Hint systems" sound very
             | irrelevant.
             | 
             | I'd love to have a way to lock a plan in a temporary
             | "emergency measure" fashion. But of course it's
             | hard/impossible to design this without letting people abuse
             | it and pretend it's just a hint system.
        
               | silon42 wrote:
               | Exactly... DB deciding to switch query plans in
               | production is sometimes terrible.
        
               | suddendescent wrote:
               | >I'd love to have a way to lock a plan in a temporary
               | "emergency measure" fashion.
               | 
               | Could this be achieved by extending prepared statements?
               | [1] The dirty option would be to introduce a new keyword
               | like PREPAREFIXED.
               | 
               | The first time such a statement is executed, the
               | execution plan could be stored and then retrieved on
               | subsequent queries. There would be no hints and the
               | changes in code should be minimal.
               | 
               | Once a query runs successfully, the users can be sure
               | that the execution plan won't change.
               | 
               | >But of course it's hard/impossible to design this
               | without letting people abuse it
               | 
               | Is this more important than having predictable execution
               | times?
               | 
               | [1] https://www.postgresql.org/docs/current/sql-
               | prepare.html
        
           | yjftsjthsd-h wrote:
           | > These kinds of articles and anecdotes come up again and
           | again
           | 
           | Don't articles and anecdotes also come up again and again of
           | developers feeding bad happens to the database and cratering
           | their performance?
           | 
           | And that wiki page... Very explicitly isn't a blanket ban?
           | They straight up say that they're willing to consider the
           | idea if somebody wants to say how to do it without the
           | pitfalls of other systems. The only thing they say they're
           | going to ignore outright is people thinking they should have
           | a feature because other databases have it (which seems fair).
        
             | erichocean wrote:
             | > _people thinking they should have a feature because other
             | databases have it (which seems fair)_
             | 
             | Literally NO ONE wants query hinting in Postgres to check
             | some kind of feature box because other databases have it.
             | 
             | We know we want it because...other databases have it, and
             | it's INCREDIBLY USEFUL.
             | 
             | > _They straight up say that they 're willing to consider
             | the idea if somebody wants to say how to do it without the
             | pitfalls of other systems._
             | 
             | Pitfalls my ass. We want exactly the functionality that is
             | already present in other systems, _pitfalls and all_. That
             | 's just an excuse to do nothing, Apple-style, "because we
             | know better than our own users" while trying to appear
             | reasonable.
             | 
             | It's akin to not adopting SQL until you can do so "while
             | avoiding the pitfalls of SQL." Just utter bullshit.
        
               | adamrt wrote:
               | > We want exactly the functionality that is already
               | present in other systems, pitfalls and all. That's just
               | an excuse to do nothing
               | 
               | Are you offering to help with the maintenance or
               | development associated with it? Or are you just demanding
               | features while calling the people that do help with that
               | stuff liars?
               | 
               | Maybe they do know better? Or maybe they know about other
               | hassles that will come with it that you can't fathom.
               | They've been developing and maintaining one of the best
               | open source projects on the planet for nearly 3 decades.
               | Maybe with all that experience, its not their opinion
               | that is BS?
        
               | mst wrote:
               | Given there's already a pg_hint_plan extension I really
               | don't see why people are so angry about the developers
               | not wanting to bless something they consider a footgun as
               | a core feature.
        
               | zepearl wrote:
               | "risk", because pg_hint is an external module, written by
               | somebody as a hobby, therefore if it stops existing or
               | working after some updates or with newer versions of
               | Postgres then it becomes a huge problem for PROD systems
               | that need it for any reason.
        
               | ghusbands wrote:
               | Sure, maybe in not doing what every other database does,
               | that really helps users out when they're suffering
               | unpredictable performance, they're demonstrating their
               | far superior knowledge. And maybe all the users that are
               | suffering are somehow incompetent.
               | 
               | Or maybe it's just typical developer hubris of the kind
               | that hits everyone at some point.
        
             | fragmede wrote:
             | We're software engineers not marketing folk who just want
             | to check a box.
             | 
             | There's a difference between "have a feature _just because_
             | other databases have it " and "this is a very useful
             | feature that would have its own PostgreSQL-isms and also we
             | got this idea because other databases have it". The query
             | planner isn't infallible, so being able to hint queries to
             | not accidentally use a temporary table that just can't fit
             | in ram isn't just copying a feature "just because everyone
             | else has it".
        
             | ghusbands wrote:
             | > they're willing to consider the idea if somebody wants to
             | say how to do it without the pitfalls of other systems
             | 
             | That is basically a blanket ban. Saying you won't implement
             | a widely-implemented feature unless someone comes up with a
             | whole new theory about how to do it better is saying you
             | simply won't implement it. Other databases do well enough
             | with hints, and they do help with some problems.
        
           | zorgmonkey wrote:
           | I've never used it but their is a postgres extension called
           | pg_hint_plan [0] for query hints and I am guessing it is
           | pretty decent because it is even mentioned in the AWS Aurora
           | documentation
           | 
           | [0]: https://pghintplan.osdn.jp/pg_hint_plan.html
        
           | zmmmmm wrote:
           | Definitely my biggest issue with Postgres.
           | 
           | I have one shameful query where, unable to convince it to
           | execute a subquery that is essentially a constant for most of
           | the rows outside of the hot loop of the main index scan, I
           | pulled it out into a temp table and had the main query select
           | from the temp table instead. Even creating a temp table _and
           | indexing it_ was faster than the plan the Postgres query
           | planner absolutely insisted on. Things like CTEs etc made no
           | difference, it would still come up with the same dumb plan
           | every way I expressed the query.
           | 
           | The worst thing is not even being able to debug or understand
           | what is going on because you can't influence the query plan
           | to try alternative hypotheses out easily.
        
         | baq wrote:
         | even plan locking itself would be a welcome feature. we've had
         | multiple instances of plans going haywire (never traced it
         | down, but 99% sure it was after autovacuum analyze) impacting
         | prod and getting back to normal after a manual re-ANALYZE.
        
           | SigmundA wrote:
           | I haven't had much use for plan locking even after MS SQL
           | server got it, even though its a nice feature with being able
           | save load plans from a file etc.
           | 
           | I just like good set of hints since they are part of source
           | code and I can put comments and see why they are there in
           | source history later.
        
         | jonatron wrote:
         | Having been in a situation where Postgres suddenly decides to
         | use a much slower query plan on a hot query in production, I'd
         | agree.
        
         | quietbritishjim wrote:
         | Although absolute sacrilege to many, but I would honestly
         | support a really explicit syntax to compose particular tables
         | and indices. Not even in top of regular SQL query, I mean its
         | own first class language.. E.g. "Take this index, restrict to
         | this subrange; for each row, pick a row using this other index
         | on the foreign key" (but it wouldn't have to be imperative like
         | I've phrased it).
         | 
         | There are times when I already know how I want the query to
         | execute, and I have to iteratively fiddle with the query to
         | indirectly trick the planner into doing that. There's just no
         | excuse for that arrangement. Even if I've made a mistake and my
         | plan couldn't work efficiently - at least let me execute it
         | inefficiently so I can discover and debug that.
        
           | gfody wrote:
           | something like sql server's "use plan" it really comes in
           | handy when you need it
           | (https://forrestmcdaniel.com/2018/07/19/how-to-use-the-use-
           | pl...)
        
           | SigmundA wrote:
           | Yes I would too. Over time I have come to appreciate at least
           | being able to control order of operations, just forcing join
           | order is many times enough of hint to keep things working
           | properly.
           | 
           | Having used Linq a lot I would actually prefer more of that
           | kind of chained statement approach that is more hands on
           | without having to explicitly loop.
        
           | derefr wrote:
           | Or, to put that another way: Postgres's query planner should
           | be emitting an explicit bytecode that is then interpreted by
           | a query-execution abstract machine. That same layer should
           | also be exposed directly to clients over the network,
           | allowing pre-planned query bytecode to be submitted directly
           | in place of a textual query.
           | 
           | I've heart the rebuttal to this idea before, that even minor
           | DB schema/configuration changes that should be "invisible" to
           | the application (e.g. an index being created -- or even
           | refreshed -- on the table) would result in the resulting
           | bytecode being different; and so you'd never be able to ship
           | such bytecode as a static artifact in your client.
           | 
           | But: who said anything about a static artifact? I'm more
           | imagining a workflow like this:
           | 
           | 1. user submits a query with leading command "PLAN" (e.g.
           | "PLAN SELECT ...")
           | 
           | 2. Postgres returns planned bytecode (as a regular single-
           | column bytea rowset)
           | 
           | 3. if desired, the user _modifies_ the received bytecode
           | arbitrarily (leaving alone anything the client doesn 't
           | understand -- much like how optimization passes leave alone
           | generated intrinsics in LLVM IR)
           | 
           | 4. user later submits a low-level query-execution command
           | over the binary wire protocol, supplying the (potentially
           | modified) bytecode
           | 
           | 5. _if the bytecode 's assumptions are still valid_, the
           | query executes. Otherwise, the query gives a "must re-plan"
           | error. The client is expected to hold onto the original SQL
           | and re-supply it in this case.
           | 
           | In other words, do it similarly to how Redis's EVAL + EVALSHA
           | works; but instead of opaque handles (SHA hashes), the client
           | receives a white-box internal representation.
           | 
           | Or: do it similarly to how compiling a shader on a game-
           | console with unikernel games + unified RAM/VRAM works. The
           | driver's shader compiler spits you back a buffer full of some
           | GPU object code -- which, running in kernel-mode, you're free
           | to modify before running.
        
             | tshaddox wrote:
             | You can't ship SQL queries and then do arbitrary schema
             | changes either, right?
        
               | lazide wrote:
               | Not arbitrary changes, but nearly so - if you add new
               | columns (and your existing queries name the columns they
               | use explicitly), then nothing changes as far as the
               | client is concerned. If you remove or rename one, only
               | those clients who refer to it need to change, and then
               | only if you don't do a interposing view/schema/etc.
               | 
               | The server is supposed to take the logical intent
               | specified by the queries and do the work to mapping that
               | into concrete retrievals in a stable and performant way,
               | so a query explicitly telling the server to do something
               | in a specific way breaks the model. The more specific,
               | the more broken.
               | 
               | DDL changes for instance ideally would not have to tell
               | the server how to structure things on disk. The server
               | should know how to do it efficiently.
               | 
               | That said, real life intrudes and sometimes (but not
               | usually) it's important to do this for stability or
               | performance reasons. The more magic involved, the more
               | unpredictable a system can be, and the closer a system
               | runs to redline, the more chaos/downtime that can cause.
        
             | jhgb wrote:
             | > Or, to put that another way: Postgres's query planner
             | should be emitting an explicit bytecode that is then
             | interpreted by a query-execution abstract machine. That
             | same layer should also be exposed directly to clients over
             | the network, allowing pre-planned query bytecode to be
             | submitted directly in place of a textual query.
             | 
             | And now you've invented Firebird, or even 1980's Interbase.
             | ;)
        
             | mst wrote:
             | I suspect the extra compatibility/dependency surface would
             | make this a non starter in practice but it's an absolutely
             | fascinating thought experiment and I'm glad you wrote it
             | up.
        
             | SigmundA wrote:
             | PostgreSQL doesn't even cache its own byte code, it replans
             | every time!
        
               | mst wrote:
               | Depending on the placeholder values the optimal plan may
               | be very different.
               | 
               | You -can- use a server side prepared statement to force
               | it to -ahem- plan ahead but that's -usually- not actually
               | worth it.
        
               | SigmundA wrote:
               | Thats why other databases have parameter sniffing.
               | 
               | And using prepared statements only works on the same
               | connection so of limited use.
               | 
               | Luckily PostgreSQL's optimizer is very primitive and so
               | planning doesn't take too much time, as it gets more
               | advanced the lack of plan reuse will become more of an
               | issue. Its already an issue with the LLVM JIT compilation
               | time.
        
           | orthoxerox wrote:
           | Clickhouse doesn't go that way all the way, but its approach
           | to SQL support is very similar: it does exactly what you have
           | written. It doesn't rewrite your query in any way: if you
           | want to filter table A before you join it with table B, you
           | write a subquery.
        
           | magicalhippo wrote:
           | There's no technical reason why one couldn't just select or
           | join from an index as if it was a table or a view is there?
           | 
           | We also write queries knowing it will use a specific index,
           | or we will create an index because of a specific query. And
           | then we have to have a scheduled task to periodically
           | recalculate statistics just so the DB server doesn't get
           | silly ideas.
           | 
           | Of course it could be abused, but I'm in favor of having ways
           | of letting programmers tell computers exactly what to do.
           | Sometimes we really do know best.
        
         | pzduniak wrote:
         | I recall using https://pghintplan.osdn.jp/pg_hint_plan.html for
         | this feature, it did what it claimed to be able to do. It's a
         | hack, but it really helps in the worst case scenarios.
        
         | remus wrote:
         | Not that I disagree with your point, but the opposite of
         | 
         | > Also having the optimizer decide to do something bad off
         | hours is not a good situation again hints or plan locking would
         | help here.
         | 
         | Can also happen when you force a particular query plan, only
         | for it to turn to treacle when some assumption you made about
         | the data suddenly breaks.
         | 
         | As with most things there's no free lunch.
        
           | lazide wrote:
           | When doing disaster planning, at least something blowing up
           | because of a data change is preventable or mitigateable.
           | Staging environments. Rollbacks, etc.
           | 
           | The system randomly deciding to drive itself off a cliff for
           | no reason, with no known way to stop it next time is quite
           | concerning.
        
         | ghusbands wrote:
         | Since the PostgreSQL folks are clearly against hints, maybe
         | another random feature suggestion: If a (read-only) query is
         | taking "too long", maybe you could launch a second copy with a
         | different plan. Maybe even a third after twice as long and a
         | fourth after twice as long again. Kill them all once any one
         | returns. Trades some extra load now against waiting forever for
         | a 'bad' query plan.
         | 
         | The idea obviously needs work and has probably already been
         | suggested and dismissed, somewhere, but I thought I might throw
         | it out there, especially with modern computers having so many
         | cores.
        
           | lgessler wrote:
           | I'm not a DB expert, but aren't most queries bottlenecked by
           | disk bandwidth? If so, addtl queries might hurt more than
           | help.
        
           | zepearl wrote:
           | On one hand this could theoretically work if the optimizer
           | tracks historical SQLs + the query plans used at that time +
           | their runtime and decides to try to use a different query
           | plan, then aborts the SQL being executed when its runtime
           | exceeds the historical runtime. But in practice it's likely
           | to mess things up because data volume/distribution/machine
           | load (CPU, disks, etc) might differ compared to the past
           | etc... .
           | 
           | On the other hand, in general there are often too many
           | postential combinations of query plans to try out (hundreds
           | even for a relatively simple SQL) and trying them all out
           | would need hours/days/etc... . The "good plan" might be
           | something that a machine might categorize as "very unlikely
           | to work" so it might end up being the one tested
           | automatically at the very end.
           | 
           | Normal hints would still be a lot easier to handle in the
           | code and for the user.
        
         | electroly wrote:
         | I really appreciate that in SQL Server we can just use query
         | hints. I don't think I will ever see eye-to-eye with the
         | PostgreSQL team on this. No query planner is perfect. SQL
         | Server's planner certainly isn't. We have had the same sort of
         | "query plan suddenly changes in production and everything
         | breaks" incidents, but they are easily fixed with query hints
         | and/or forced plans, and you can be sure that future statistics
         | updates won't break them again.
        
           | SigmundA wrote:
           | I still have nightmares of SQL servers optimizer taking down
           | production systems at random times due to picking a bad plan,
           | judicious use of hints has made my life much easier and
           | allowed me as a developer to be in control and not have to
           | ask the DB nicely and pray it listens.
        
             | newlisp wrote:
             | I guess that is why cloud databases like spanner charge big
             | bucks.
        
             | fennecfoxen wrote:
             | I worked for a payments firm that submitted transfers in
             | bulk every day, but had a monthly surge day, and a late
             | cut-off period to submit or cancel payments. It doesn't
             | matter if you can do everything else in 15 minutes, if the
             | query planner decides to take a 30-second query and turn it
             | into a 4-hour nightmare because different customers are
             | making different numbers of payments today, and it trips
             | over some Postgres heuristic.
             | 
             | The Postgres query planner was, quite frankly, the enemy.
             | By the time I left we were at the point that we considered
             | it a business risk and were looking at alternatives. If you
             | need queries to run in a predictable amount of time --
             | forget _fast_ , simply _predictable_ -- then Postgres is
             | quite simply not fit for purpose.
        
               | slt2021 wrote:
               | these query issues are most of the times not SQL Server
               | problem, they are rather symptom of poor data model
               | architecture, table design.
               | 
               | A lot of times I see in operational DB hot data gets
               | mixed with warm and cold data, and your hot path query
               | with ton of JOINs/subqueries will get rekt.
               | 
               | Proper redesign and rearchitecture helps to provide large
               | buffer against these problems.
               | 
               | Also not everything should be inside SQL server, if you
               | run large query every 5 seconds or something - probably
               | consider using in-memory cache or denormalized model
        
               | fennecfoxen wrote:
               | Excuse me. You are replying with advice that is
               | fundamentally inappropriate for a batch workload
               | (describing a "hot path query" and suggesting in-memory
               | caches), and it shows that you have not read and
               | comprehended the post I wrote.
               | 
               | Besides which, if you need to re-architect and
               | denormalize by moving records in and out of different
               | tables just so you can run what is essentially a monthly
               | report in a predictable (not fast! just predictable)
               | amount of time -- well then, why bother using Postgres in
               | the first place, you might as well bite the bullet and go
               | with something NoSQL at that point, because at least when
               | you read from a given Cassandra partition you know all
               | the results will be right next to each other, 100%; why
               | leave it to chance? you've been burned by Postgres before
        
               | slt2021 wrote:
               | As a developer I never had to blame SQL server, it was
               | all the time develop mistake: poor architecture, poor
               | query, some data quality issues. Dont blame the platform
               | for developers mistakes, thats what I was trying to say
        
               | fennecfoxen wrote:
               | The wise man builds his house upon the rock. The foolish
               | man builds his house upon the sand. I will 100% blame the
               | platform when what it provides is deficient, for
               | selecting a deficient platform is one of the gravest
               | mistakes a developer can make.
               | 
               | And Postgres, again, is materially deficient in its query
               | performance predictability, and its developers are
               | insistent that they have no desire to allow the
               | approaches that would mitigate it. If this matters to
               | your application, then the prudent developer will drop
               | Postgres and use a real database.
        
               | SigmundA wrote:
               | Must not have been working with it very long, off the top
               | of my head having to disable the "new" optimizer in Sql
               | 2014 and go back to the 2012 version due to bad plans
               | https://sqlservergeeks.com/sql-server-2014-trace-
               | flags-9481/
               | 
               | I have been working with it since way back in 6.5 and it
               | has made many obviously wrong decisions in the optimizer
               | over the years, its has gotten better but is by no means
               | perfect, good thing is it has hints unlike PG!
        
               | slt2021 wrote:
               | So whats the issue with marginal difference in
               | cardinality estimators? Both queries seem perfectly fine
               | speed-wise.
               | 
               | Just splitting hairs, again most of the time developers
               | fail to notice the significant data distribution drift,
               | and dont rearchitect their data models then blame the
               | engine for developers omissions and invalid assumptions
        
               | SigmundA wrote:
               | https://www.sqlconsulting.com/archives/big-performance-
               | probl...
        
               | fragmede wrote:
               | Blaming the table architecture doesn't get rid of the
               | fact that the PostgreSQL planner can make decision that
               | will cause a query to take an unknown amount of time.
        
       | cube2222 wrote:
       | Hey, author here.
       | 
       | The article describes a fun debugging and optimization session
       | I've had recently. Happy to answer any questions!
        
         | tqi wrote:
         | Interesting write up! Did you happen to try this with a CTE
         | instead of a subquery? ie something like:                 WITH
         | tmp AS (SELECT accounts_other.id, COUNT(*) n
         | FROM accounts accounts_other                   JOIN stacks
         | stacks_other ON accounts_other.id = stacks_other.account_id
         | JOIN runs runs_other ON stacks_other.id = runs_other.stack_id
         | WHERE (stacks_other.worker_pool_id IS NULL OR
         | stacks_other.worker_pool_id = worker_pools.id)
         | AND runs_other.worker_id IS NOT NULL                  GROUP BY
         | 1                 )       SELECT COUNT(*)
         | as "count",              COALESCE(MAX(EXTRACT(EPOCH FROM
         | age(now(), runs.created_at)))::bigint, 0) AS "max_age"
         | FROM runs         JOIN stacks ON runs.stack_id = stacks.id
         | JOIN worker_pools ON worker_pools.id = stacks.worker_pool_id
         | JOIN accounts ON stacks.account_id = accounts.id         LEFT
         | JOIN tmp ON tmp.id = accounts.id       WHERE
         | worker_pools.is_public = true         AND runs.type IN (1, 4)
         | AND runs.state = 1         AND runs.worker_id IS NULL
         | AND accounts.max_public_parallelism / 2 > COALESCE(tmp.n, 0)
         | 
         | I don't know this for sure, but in my experience it has seemed
         | like Postgres is bad at optimizing subqueries, and will execute
         | once per row.
        
           | cube2222 wrote:
           | Haven't run it, but based on the query plan in the first
           | picture, PostgreSQL already does that part kind of right. It
           | precalculates it for all accounts and then does a three-way
           | hash join.
        
           | dragonwriter wrote:
           | Aren't CTEs still optimization barriers in Postgres, so
           | wouldn't that make any subquert optimization problem worse?
        
             | CBLT wrote:
             | Wasn't this solved in Postgres 12?[0]
             | 
             | [0] https://www.depesz.com/2019/02/19/waiting-for-
             | postgresql-12-...
        
             | kristjansson wrote:
             | Mostly no, since PostgreSQL 12
        
             | [deleted]
        
             | briffle wrote:
             | The fun thing is, they have corrected a bunch of that in
             | newer versions of postgresql.
             | 
             | I'm surprised that the problem stemmed from an index where
             | the 'AND NOT NULL' took forevery to scan, and they didn't
             | include a partial index, or an index in a different order
             | since they are sorted.
        
             | srcreigh wrote:
             | Depends if materialization helps.
             | 
             | As a silly example, if the CTE materializes 1/100th of a
             | table via table scan, scanning 1/100th of a table N times
             | may be faster than scanning the entire table N times.
        
           | aeyes wrote:
           | For Postgres 12+ you'll have to use to force the plan
           | 
           | > WITH tmp AS MATERIALIZED
        
         | mijoharas wrote:
         | Hey, I think you'll probably get better performance if you swap
         | your subquery around a little from:                 SELECT
         | COUNT(*)       FROM runs runs_other       WHERE (SELECT
         | COUNT(*)             FROM stacks             WHERE stacks.id =
         | runs_other.stack_id               AND stacks.account_id =
         | accounts.id               AND stacks.worker_pool_id =
         | worker_pools.id) > 0        AND runs_other.worker_id IS NOT
         | NULL
         | 
         | To:                 SELECT COUNT(*)       FROM runs runs_other
         | WHERE EXISTS (SELECT *             FROM stacks
         | WHERE stacks.id = runs_other.stack_id               AND
         | stacks.account_id = accounts.id               AND
         | stacks.worker_pool_id = worker_pools.id)        AND
         | runs_other.worker_id IS NOT NULL
         | 
         | Your previous option needs to hit every row in the stacks
         | subquery to generate the count. The modified one uses EXISTS so
         | it only needs to check there's at least one row, and then can
         | short-circuit and exit that subquery. (a smart enough query
         | planner would be able to do the same thing, but PG generally
         | doesn't).
        
           | cube2222 wrote:
           | Thanks for the tip! I didn't know EXISTS can provide such
           | benefits.
           | 
           | However, I've tried it out and it resulted in a 10x
           | performance loss. Looking at the query plan, it actually got
           | PostgreSQL to be too smart and undo most of the optimizations
           | I've done in this blog post... It's still an order of
           | magnitude faster than the original query though!
        
             | mijoharas wrote:
             | I stand corrected, and always glad to see the main point
             | about perf optimisation proven. (always measure! :) )
        
         | gaha wrote:
         | Have you tried unnesting the subquery? From the images posted
         | in your blog it's not clear to me how often the subquery is
         | run. Maybe unnesting it somehow like this should work well:
         | SELECT COUNT(*)        as "count",
         | COALESCE(MAX(EXTRACT(EPOCH FROM age(now(),
         | runs.created_at)))::bigint, 0) AS "max_age"         FROM runs
         | JOIN stacks ON runs.stack_id = stacks.id           JOIN
         | worker_pools ON worker_pools.id = stacks.worker_pool_id
         | JOIN accounts ON stacks.account_id = accounts.id
         | /*unnesting*/           join (SELECT accounts_other.id,
         | COUNT(*) as cnt            FROM accounts accounts_other
         | JOIN stacks stacks_other ON accounts_other.id =
         | stacks_other.account_id              JOIN runs runs_other ON
         | stacks_other.id = runs_other.stack_id            WHERE
         | accounts_other.id              AND (stacks_other.worker_pool_id
         | IS NULL OR               stacks_other.worker_pool_id =
         | worker_pools.id)              AND runs_other.worker_id IS NOT
         | NULL             group by accounts_other.id) as acc_cnt
         | on acc_cnt.id = accounts.id and
         | accounts.max_public_parallelism / 2 > acc_cnt.cnt
         | /*end*/         WHERE worker_pools.is_public = true
         | AND runs.type IN (1, 4)           AND runs.state = 1
         | AND runs.worker_id IS NULL           /* maybe also copy these
         | filter predicates inside the unnested query above */
         | 
         | Edit: format and typo
        
         | soperj wrote:
         | what are the downsides?
        
           | cube2222 wrote:
           | The downside is that a PostgreSQL version update can change
           | internal optimizer rules, which may result in the plan being
           | changed to a less performant one again.
           | 
           | In that case it's good to have monitoring in place to catch
           | something like that. We're using Datadog Database monitoring
           | which gives you a good breakdown of what the most pricey
           | queries are - it uses information from the pg_stat_statements
           | table, which contains a lot of useful stats about query
           | execution performance.
           | 
           | You should regularly check such statistics to see if there
           | aren't any queries which are unexpectedly consuming a big
           | chunk of your database resources, as that might mean you're
           | missing an index, or a non-optimal query plan has been chosen
           | somewhere.
        
             | topspin wrote:
             | > The downside is that a PostgreSQL version update can
             | change internal optimizer rules
             | 
             | This has actually happened relatively recently.
             | 
             | PostgreSQL Common Table Expressions (WITH x as (some
             | query)) used to be optimization fences by default, meaning
             | that you can influence the optimizer using a CTE. This is a
             | well known technique among PostgreSQL users that resort to
             | it for lack of optimizer hints.
             | 
             | In Pg12 they enhanced the optimizer to remove the
             | optimization fence _by default_ , so the query plans of
             | existing queries automatically changed and sometimes became
             | much slower as a result. If you want the old CTE behavior
             | you have to modify the WITH clause via AS MATERIALIZED.
        
       | Justsignedup wrote:
       | heh i had a similar style problem, i think it was more 1000x than
       | 200x, but its because certain query structures the optimizer
       | couldn't figure out, had to try like 4 different re-writes of the
       | sql till the optimizer could understand it.
       | 
       | I guess at this point I just see it as just part of my job to do
       | this insane stuff, good to know that others see this as hard :)
        
       | pkrumins wrote:
       | Instead of doing this madness with complicated queries, I always
       | advise to split complicated queries into many smaller simple
       | queries and then merge the results in the code. This way you
       | don't need to debug the queries as all of them are very simple
       | SELECTs and the code that merges the results is one or two for
       | loops.
        
         | ccleve wrote:
         | This. Plus, you can use temporary tables, which limits the
         | amount of application code you have to write. Put your
         | intermediate results in temp tables, and build indexes on them
         | if you have to. It's a cheap way of forcing the optimizer to do
         | things your way.
        
         | cultofmetatron wrote:
         | in practice that means mroe data being sent to the application
         | code that then has to process that. 99% of the time, you'll get
         | more performance figuring out the right way to express the data
         | structure you want in sql. The entire computation happens in
         | the database so all the data is immediately available to
         | process on disk and you only send what you need for bandwidth
         | consumption. IF the query is slow, you need to think about how
         | you setup your joins and being aggressive about your use of
         | predicates in subqueries to reduce the size of temporary
         | tables.
        
           | chrisjc wrote:
           | Not in every case. Some DBs allow you to reference the
           | results and use the result reference in subsequent queries.
           | As a result, data stays in the DB until you're actually ready
           | to pull it.
           | 
           | Or of course, you can just use temp tables.
        
             | cultofmetatron wrote:
             | thats a nice feature. does postgres have it? in that case,
             | its even MORE performant to push your data processing into
             | the database via a query
        
               | chrisjc wrote:
               | I'm not too sure about Postgres honestly. I was
               | specifically talking about Snowflake, although there
               | might be others that do it too.
               | 
               | https://docs.snowflake.com/en/sql-
               | reference/functions/result...
               | 
               | Sometimes you can follow this pattern and never actually
               | pull data into your application, which is especially true
               | when you're doing ETLs (table --> table).
        
         | Tostino wrote:
         | That can sometimes be a solution... Just keep in mind, you are
         | manually writing a join algorithm every time you need to do
         | that, and sometimes it will be the right one for the task,
         | other times not so much. The query planner will adapt your join
         | type for you instead, if you can trust it.
         | 
         | As the shape of your data changes over time, your hard coded
         | algorithm choice is not guaranteed to be the best one.
        
         | takeda wrote:
         | That won't always work. Main reason for bigger queries is to
         | get exact data you need, so you don't have to send unnecessary
         | data over the wire which is slow.
        
         | cube2222 wrote:
         | That's more custom logic and additional code to maintain
         | though.
         | 
         | In this case we're using the Datadog Agent with its
         | custom_queries directive to automatically create metrics based
         | on periodically evaluated SQL queries.
        
         | jsumrall wrote:
         | Sounds like just using CTEs. This is how I write some
         | complicated queries, as a few CTEs with specific logic to get
         | an overall query written in a more readable way.
        
       | [deleted]
        
       | igammarays wrote:
       | Are you happy with Aurora Serverless so far? Is it really the
       | "painless scaling" it's made out to be? I'm really hesitant to
       | use anything other than self-hosted Postgres with full control,
       | because I'm always afraid of running into situations like these
       | and not being able do what I need to do, like have an offline
       | analytical read replica. Also I imagine the cost is an order of
       | magnitude more than a self-hosted instance.
        
         | cube2222 wrote:
         | Not really. But it's mostly because the support for other
         | features is basically non-existant. It also seems like all
         | development is happening on Aurora Serverless v2 (which I think
         | is in preview state?). The scaling also isn't seamless, it
         | breaks your connections if you're using prepared statements -
         | we have it pinned to a specific number of capacity units.
         | 
         | We'll probably move to plain Aurora at some point, as that
         | supports all bells and whistles, but for now Aurora Serverless
         | is good enough.
        
       | no-s wrote:
       | It's impressive how many developer misconceptions I see cropping
       | up in these articles. The point of the query planner is to
       | replace an army of programmers, not evade design faults. Using a
       | server RDBMS it's trivially easy to decouple schema and query
       | evolution from application. Yet somehow developers always seem to
       | back themselves into situations requiring extreme coupling. Using
       | plan guides in the MSSQL environment helps somewhat in patchwork
       | solutions, tho typically the deficiencies are usually related to
       | obsolete statistics...(and that's not even necessary). In
       | Postgres you should ensure design separation so you can rewrite
       | the query easily.
        
         | newlisp wrote:
         | _Using a server RDBMS it's trivially easy to decouple schema
         | and query evolution from application._
         | 
         | Curious, how do you decouple query evolution from the DB layer
         | of your application?
        
           | dragonwriter wrote:
           | > Curious, how do you decouple query evolution from the DB
           | layer of your application?
           | 
           | Use application-specific views, and the evolution driven by
           | DB optimization, etc., can happen in view definitions without
           | the app DB layer being involved. It's a traditional best
           | practice anyway, going back decades to the time when it was
           | expected an RDBMS would serve multiple apps that you needed
           | to keep isolated from each others changes, but it also works
           | to isolate concerns at different levels for a single-app
           | stack.
           | 
           | (Because of practical limits of views in some DBs, especially
           | in the 1990s, and developer preferences for procedural code,
           | a common enterprise alternative substitutes stored procs for
           | views, to similar effect.)
        
             | no-s wrote:
             | hah, exactly. "application-specific views...Going back
             | decades", or "substitute Stored Procs for views".
             | 
             | It's not really much extra overhead to use a schema+views.
             | I ask developers to avoid creating views with "Select *",
             | always specify the columns desired.
             | 
             | Further decoupling is possibly beneficial, e.g. CQRS is
             | wonderful perspective for designing data models that may be
             | more easily distributed or cacheable (by explicitly
             | choosing to separate query schema from modification
             | schema).
        
             | newlisp wrote:
             | Oh yes views, they are not with no effort and become
             | numerous and specific. But are extremely missed with a
             | document store I'm working on write now ;)
        
       | Mathnerd314 wrote:
       | Similar article that instead solved it by recording more
       | statistics: https://build.affinity.co/how-we-used-postgres-
       | extended-stat...
        
       | AaronFriel wrote:
       | Regarding this:
       | 
       | > Only a minuscule part of the Runs in the database are active at
       | any given time. Most Stacks stay dormant most of the time.
       | Whenever a user makes a commit, only the affected Stacks will
       | execute Runs. There will obviously be Stacks that are constantly
       | in use, but most won't. Moreover, intensive users will usually
       | use private Worker Pools. The public shared Worker Pool is
       | predominantly utilized by smaller users, so that's another reason
       | for not seeing many Runs here.
       | 
       | > Right now, we're iterating over all existing Stacks attached to
       | the public worker pool and getting relevant Runs for each.
       | 
       | Is there an opportunity to make this query more robust with a bit
       | of denormalization, if necessary, to indicate which runs and
       | which stacks are active and a filtered index (index with a WHERE
       | clause) on each to allow efficiently retrieving the list of
       | active entries?
       | 
       | Best practices for database optimization tell you "don't index on
       | low cardinality columns" (columns with few unique values), but
       | the real advice should be "ensure each valid key of an index has
       | a low cardinality relative to the size of the data" (it's OK to
       | have an index that selects 1% of your data!).
       | 
       | That is, conventional wisdom says that indexing on a boolean is
       | bad, because in the best case scenario 50% of your entries are
       | behind each key, and in the worst case almost all entries are
       | behind either "true" or "false". There are performance impacts
       | from that and there's the risk the query optimizer tries to use
       | the "bad" side of the index, which would be worse than a full
       | table scan.
       | 
       | But with the improved advice and insight that 0.01% of rows have
       | value "true", you can create an index with a `WHERE active =
       | true` clause, et voila: you have an index that finds you exactly
       | what you want, without the cost and performance issues of
       | maintaining the "false" side of the index.
        
         | cube2222 wrote:
         | Yep, you could indeed use a partial index with the two relevant
         | conditions to solve this query well. We're using those all over
         | the place as well.
        
       | deltaonefour wrote:
       | This article is everything that's wrong with SQL.
       | 
       | How about just providing us access to the query plan and allowing
       | users to directly modify it? It's like needing assembly language
       | to optimize a really tight section of code but having that access
       | blocked off.
       | 
       | Why do we have to rely on leakage to a higher level interface in
       | order to manipulate the lower level details? Just give us manual
       | access when we need it.
        
         | shock-value wrote:
         | I think SQL is generally great but I do agree with you. Access
         | to lower level primitives from which one could construct their
         | own query plan would be awesome. Query hints (not available in
         | Postgres) would also probably get you a lot of the way there.
        
       | hnov wrote:
       | In this case denormalizing the data right into the runs table
       | would likely give you better, but more importantly more
       | predictable performance.
        
       | srcreigh wrote:
       | The curious part is why postgres can't estimate that the size of
       | the active runs<>stacks join will be very small.
       | 
       | A core part of query planning is estimating the best join order
       | using a dynamic programming algorithm. So roughly, A(1M) <> B(1M)
       | <> C(10) should use the join order A-(B-C), not (A-B)-C.
       | 
       | I bet it's something like, postgres doesn't know the correlation
       | between runs_other.worker_id and runs_other.stack_id. It seems
       | like its seeing low number of runs_other.worker_id, then
       | _estimating the stacks are split evenly among those small number
       | of runs_ in the millions.
       | 
       | (Why it wouldn't know stacks.id correlates to one stack each,
       | idk. Is stacks.id not the primary key? Is there a foreign key
       | from runs? Very curious.)
       | 
       | What happens if you follow this guide to hint Postgres as to
       | which columns are statistically related? [0]
       | 
       | Hinting PG's estimating stats may be a more persistent solution.
       | Another core part of query planning is query rewriting-if they
       | add a rule to simplify COUNT(select...)>0 to normal joins (which
       | are equivalent I think?) then your trick may stop working.
       | 
       | [0]: https://www.postgresql.org/docs/12/multivariate-
       | statistics-e...
        
         | cube2222 wrote:
         | To be fair, we're using Aurora Serverless, so I'll have to dig
         | deeper if we can use these kinds of hints. But thanks for the
         | tip, this could be really useful!
        
           | mst wrote:
           | Given Aurora talks about itself as a replacement for pg/mysql
           | that provides compatible skims I'm honestly wondering whether
           | you're dealing with the pg optimiser here in any meaningful
           | way.
           | 
           | (exactly how far along the scale from 'tweaked version' to
           | 'basically just a skin on a completely new database' aurora
           | is is ... opaque to me ... there's probably something out
           | there that answers that but I haven't yet managed to find it)
           | 
           | Edited to add: zorgmonkey downthread points out they have
           | support for some extensions like pg_hint_plan - https://docs.
           | aws.amazon.com/AmazonRDS/latest/AuroraUserGuide... - which
           | suggests at least a decent amount of pg code but I'm still
           | curious exactly where the changes start.
        
             | cube2222 wrote:
             | I obviously don't know its exact internal architecture, but
             | based on my digging and trying out different PostgreSQL-
             | native features, it looks like it's PostgreSQL instances as
             | query executors with a custom (high-availability) storage
             | layer, fronted by PgBouncer.
        
       | slt2021 wrote:
       | There is no need to trick query optimizer, just common sense and
       | understanding of rdbms engine.
       | 
       | The query seems pretty standard and often occuring problem with
       | nested queries. Every JOIN and nested query grows computation due
       | to cartesian combination, so having your predicates as narrow and
       | specific as possible is a must.
       | 
       | Also Nested Loop is a kiss of death for your query performance
        
       ___________________________________________________________________
       (page generated 2022-01-18 23:00 UTC)