[HN Gopher] PostgreSQL EXPLAIN Output Explained
       ___________________________________________________________________
        
       PostgreSQL EXPLAIN Output Explained
        
       Author : PhilipTrauner
       Score  : 131 points
       Date   : 2021-05-28 16:01 UTC (6 hours ago)
        
 (HTM) web link (www.cybertec-postgresql.com)
 (TXT) w3m dump (www.cybertec-postgresql.com)
        
       | hermanradtke wrote:
       | An alternative to https://explain.dalibo.com/ is
       | https://tatiyants.com/pev
       | 
       | Both have pros and cons about how they visualize things.
        
         | tclancy wrote:
         | Came here to suggest the same. If you hit the gears on the left
         | and choose view: compact and graph: cost you can get a decent
         | overview of the hot spots in complicated queries quickly.
        
         | MrOxiMoron wrote:
         | PEV is great, helped me figure out and fix specific query
         | issues that only happened on production
        
       | firloop wrote:
       | Great writeup. I use EXPLAIN a lot in development as a gut check
       | -- "does this descending index do what I thought it would? how
       | expensive is that subquery?" Highly recommend looking at it early
       | on, it helps me catch silly mistakes well before production.
        
       | efxhoy wrote:
       | One thing I learned about EXPLAIN this week is that it doesn't
       | show constraint checks. I was trying to delete about 40k rows
       | from a table and it was taking hours and I couldn't figure out
       | why. ANALYZE EXPLAIN showed nothing indicating anything about
       | reading any of the other tables than the FROM and the USING
       | table.
       | 
       | The table I was deleting from had 20 foreign key constraints
       | referencing it, and a couple of them didn't have an index on the
       | referencing column and were big (a few million rows). Added
       | indexes to all of them, took a couple of minutes to build, and
       | the DELETE ran in a few seconds.
       | 
       | Sometimes the answer to a performance issue can't be found in
       | EXPLAIN. And always remember to properly index your foreign key
       | constraints.
        
         | michristofides wrote:
         | EXPLAIN ANALYZE would have shown you referential integrity (RI)
         | triggers taking most of the time, but it's still a bit of a
         | leap to work out that it's due to missing foreign key indexes
         | if you don't already know
        
       | takeda wrote:
       | The site mentioned in the article also has a series that goes
       | more in depth how to read and understand explain output:
       | 
       | https://www.depesz.com/tag/unexplainable/
        
       | ezekg wrote:
       | I use an awesome service called PgMustard [0] for parsing and
       | debugging slow queries. It has saved me a lot of time, and has
       | helped me resolve some pretty big (and complicated) bottlenecks.
       | 
       | [0]: https://pgmustard.com
        
         | michristofides wrote:
         | Thanks for the shout out, I'm half the team behind pgMustard,
         | happy to answer questions here if anyone has any
        
       | NeutralForest wrote:
       | Probably one of the best resources to understand indexes and the
       | output of `EXPLAIN ANALYZE` would be https://use-the-index-
       | luke.com/
        
       | airstrike wrote:
       | I appreciate the first image in TFA is supposed to just be funny
       | but it would actually be useful to have an output like that. Some
       | of those analyses are tougher than others to code but a subset of
       | them are not entirely out of the realm of possibility.
        
         | michristofides wrote:
         | I hope we're not truly a consultants nightmare, but we've got
         | quite a few of these covered in pgMustard (15+ tip types) and
         | working to add more.
        
       | munk-a wrote:
       | The article touched on some caveats but missed what I think is a
       | big one - you really want to capture any detailed explains from
       | environments as close to production as possible. Different table
       | statistics can cause the planner to go in wildly different
       | directions and while faster is always better it is very easy to
       | accidentally get caught up trying to sink a lot of effort into
       | making a query more performant that was running slow due to
       | thrashing in ram on a dev-box.
       | 
       | Explain (analyze at least - which you should always use) is a lot
       | less theoretical than you might assume. That can make it a bit
       | more onerous to execute but it ends up adding a lot of value to
       | the statistics when you gather them.
       | 
       | Oh also - query caching on postgres is a thing so if you're
       | worried about performance from a cold state don't forget to clear
       | caches before executing. And if anyone has any good suggestions
       | around tools to screw up table statistics I haven't found a good
       | one that I like yet.
        
         | samokhvalov wrote:
         | > Different table statistics can cause the planner to go in
         | wildly different directions
         | 
         | Exactly. That's why my team and I (Postgres.ai) have developed
         | Database Lab Engine [1] and a chatops tool for SQL
         | optimization, Joe bot [2], both are open-source (AGPLv3).
         | 
         | EXPLAIN (ANALYZE, BUFFERS) has to be executed on the same-size
         | DB, with properly adjusted Postgres configuration.
         | 
         | Interesting, that the machine you might using for query plan
         | troubleshooting, can have less RAM and different hardware in
         | general - it doesn't matter for the planner. Even
         | shared_buffers doesn't matter - you can set
         | effective_cache_size matching production (this trick we use in
         | Database Lab when hardware is weaker than on production).
         | 
         | As for the cache states - very good point as well. I'm
         | advocating for buffers- or rows-centric approach: first,
         | optimization should be done to reduce the numbers of buffers
         | or, if you're working with "logical" (dump/restore) copy of the
         | database rather than "physical" (PGDATA copy, keeping the same
         | data layout, including bloat, etc.) - the fewer the numbers,
         | the better. Only then, you pay attention to timing - and keep
         | in mind what can happen under the worst conditions (everything
         | is read from disk), if it makes sense.
         | 
         | [1] https://postgres.ai/products/how-it-works
         | 
         | [2] https://postgres.ai/products/joe
        
           | samokhvalov wrote:
           | > tools to screw up table statistics
           | 
           | Perhaps you already know these, but just in case:
           | 
           | - https://github.com/ossc-db: pg_dbms_stats, pg_store_plans,
           | pg_hint_plan
           | 
           | - https://github.com/HypoPG/hypopg
        
         | comboy wrote:
         | I would say you need it on production environment.
         | 
         | Exact same configuration is not enough. You want shared buffers
         | and disk cache to look the same as it looks on production and
         | you also want the same common queries running in the
         | background.
         | 
         | I mean, "need" in case of a busy database and being at a high
         | optimization level where small details matter. You can catch
         | more obvious stuff with much less care.
        
         | ganomi wrote:
         | To get production EXPLAINS for problematic queries you can
         | activate auto_explain on a postgres instance. For my
         | transactional system i have set it up to log EXPLAINS for all
         | queries that take more than 2000 ms.
        
           | munk-a wrote:
           | Auto_explain is a pretty great tool to spread knowledge on
           | yea - I've actually built out a lot of functionality related
           | to our DB handle where I work and one of the features I added
           | was a software configuration to establish a threshold that
           | could also be impacted by other runtime variables. We've used
           | this to track specific classes of queries over time and
           | figure out what's going wrong and it can be advantageous (if
           | you know a query sometimes does run long) to capture explains
           | of it executing quickly - sometimes you'll get really helpful
           | information like the query planner changing it's mind when
           | passing a threshold of so many rows and know clearly what you
           | want the query planner to decide to do.
           | 
           | If you're a small enough shop to consider it I highly
           | recommend setting up something to automatically explain
           | queries meeting some criteria on production or using some
           | analysis stack (like new relic) to just capture all the query
           | executions within certain time windows.
           | 
           | These tools all come with costs and should never just run
           | continuously on production if you're getting no benefit from
           | them, but the value can be quite significant.
        
           | samokhvalov wrote:
           | Great extension, yes. There is overhead when enabling the
           | timing and buffers options, but sometimes it's not big [1]
           | 
           | But auto_explain solves only part of the task - you can see
           | what happened, but cannot see the answers to "what if"
           | questions. ("What if I used this index?")
           | 
           | [1] https://www.pgmustard.com/blog/auto-explain-overhead-
           | with-ti...
        
       ___________________________________________________________________
       (page generated 2021-05-28 23:00 UTC)