[HN Gopher] Understanding Recursive Queries in PostgreSQL
       ___________________________________________________________________
        
       Understanding Recursive Queries in PostgreSQL
        
       Author : fdr
       Score  : 93 points
       Date   : 2020-08-17 04:46 UTC (18 hours ago)
        
 (HTM) web link (www.cybertec-postgresql.com)
 (TXT) w3m dump (www.cybertec-postgresql.com)
        
       | barrkel wrote:
       | Recursive queries are better mentally modelled as iterated
       | queries building up a result set. If you think of recursion as
       | tracing a tree, or in trivial cases, a linked list, recursive
       | queries are a breadth-first traversal.
       | 
       | You better not traverse too deep (iterate too much), or you lose
       | the data locality benefits that relational databases are tuned
       | for, and all you end up saving is the cost of round trips to the
       | database.
       | 
       | Many tree structures are better modelled as prefix pattern
       | matching on string paths. Extracting a whole subtree with "path
       | LIKE 'foo/bar/baz/%'" on an indexed path column is much faster
       | than following a series of parent links with a recursive query,
       | more and more so the deeper your subtree gets. This denormalizes
       | the tree structure into the path column - you can no longer make
       | structural edits with small updates - but it's much faster for
       | queries.
        
       | elchief wrote:
       | How does a Recursive CTE run, line by line?
       | 
       | https://stackoverflow.com/questions/3187850/how-does-a-recur...
        
       | koeng wrote:
       | I use recursive queries in SQL to search persistent suffix arrays
       | (mainly when there is a LOT OF DATA). You can do string search
       | hundreds of gigabytes of data within ~0.004s with SQLite on cheap
       | SSD and pretty much no RAM - my goal is to be able to be able to
       | search and align on all genetic information we have so far (about
       | ~9 trillion base pairs in total)
       | 
       | If anyone knows how to increase INSERT speeds on Postgres, I
       | would appreciate. I can get about 100K per second with COPY or
       | putting the inserts in a transaction. Have tried some more
       | mirrored NVME drives and more RAM with ZFS, as well as dropping
       | some unimportant indexes, but I'm still a little limited. Any
       | thoughts?
        
         | chrisshroba wrote:
         | I'm new to postgres and having trouble visualizing the
         | recursive query you're describing. Would you be willing to
         | share an example query (or something that illustrates the point
         | on fake data)?
        
           | koeng wrote:
           | The query that I have working is an abomination (it works
           | though, https://github.com/Koeng101/poly/blob/polyStore_v0.1/
           | store/c...)
           | 
           | The basic idea is that I am using SQL to back a suffix array
           | (https://en.wikipedia.org/wiki/Suffix_array). The suffix
           | array has all suffixes of a string (for "bats", it would have
           | ["bat", "at", "t"] sorted in alphabetical order ["at", "bat",
           | "t"] except storing the index of the suffix [1, 0, 2].
           | 
           | To search for a string, you go halfway in the suffix array
           | and check if the suffix is lexicographically equal to, larger
           | than, or smaller than your input. If it is not equal to,
           | halve in the right direction. If the string is there,
           | eventually you'll hit it.
        
         | liminal wrote:
         | Came across this upcoming webinar on the topic:
         | https://www.timescale.com/webinar/5-ways-to-improve-yourpost...
        
         | vii wrote:
         | Assuming you're doing bulk inserts, and can restart the insert
         | on any failure, one tactic is to disable all consistency and
         | durability storage features during the insert. This heavily
         | reduces the IO operations needed and the need to wait for them
         | to complete. That's assuming that iops are the resource that
         | you are constrained on.
         | 
         | Steps:
         | 
         | - use ext2 or even tmpfs as the filesystem, this disables
         | journaling and COW features of ZFS; mount options
         | async,nobarrier
         | 
         | - set tables to be UNLOGGED to disable Postgres WAL
         | 
         | This got things fast enough for my last use-case. But next
         | ideas, as at some point you then become CPU/memory bound
         | 
         | - you could try sharding by partitioning the table
         | 
         | - another trick is to use SQLite as a backing store for inserts
         | (it is quite fast if you turn off all logging) and then query
         | with Postgres via a FDW https://github.com/pgspider/sqlite_fdw
        
         | jarym wrote:
         | COPY is the fastest method to insert data in Postgres. As
         | another poster mentioned, setting table to UNLOGGED will help
         | also.
         | 
         | Finally, you can drop ALL indexes and recreate them after
         | import (been hit and miss in my testing but on occasion has
         | helped).
        
         | mulmen wrote:
         | Why ZFS? I thought copy-on-write filesystems were discouraged
         | for database workloads. I know it is possible but are you
         | getting other benefits from ZFS?
        
           | noctune wrote:
           | For OLTP, COW is horrible. For OLAP it's usually fine. I've
           | had good results with Postgres and ZFS with LZ4 compression.
        
         | site-packages1 wrote:
         | With inserts you'll see great results from goosing the
         | transaction array multiplexer. Just make sure you're aware of
         | piping the commit hash to the properly isolate the down range
         | effects on the memcpy and you'll avoid corruption issues. This
         | has worked for me anyways.
        
           | fabian2k wrote:
           | This comment is entirely random gibberish, not sure why
           | you're posting that. I initially suspected a GPT-3 bot or
           | something like that, but the comment history didn't support
           | that.
        
         | rarrrrrr wrote:
         | Is there a reason you don't wish to batch inserts into
         | transaction(s)? Otherwise, you're asking the database to commit
         | (fsync to disk) between each insert statement.
         | 
         | Other ideas:
         | 
         | - Have multiple processes inserting in parallel (this interacts
         | with the commit_delay and commit_siblings settings in
         | postgresql.conf)
         | 
         | - Use unlogged tables (but table is truncated on db startup)
         | 
         | - Put fsync=off and synchronous_commit=off in postgresql.conf
         | (but db may be corrupted in a crash)
         | 
         | - switching to BRIN indexes might be an option, but depends on
         | data
        
           | koeng wrote:
           | I do batch the inserts into a large transaction, and usually
           | optimize for the transaction to commit in approximately ~10s.
           | Not sure if it is optimal, but I like to monitor it like
           | that.
           | 
           | BRIN indexes actually work for some of the data types I'm
           | working with! Thank you for pointing those out!
        
       | fabian2k wrote:
       | Without benchmarking it, my first instinct would still be to
       | design the schema in a way to avoid recursive queries, if
       | possible. I'm thinking of stuff like a simple tree, e.g.
       | categories with subcategories arbitrarily nested. In that case
       | I'd probably just store all the parents as well, not just the
       | immediate category. That is more work in the insert/update case,
       | but makes queries that include all children trivial.
        
       | doersino wrote:
       | Recursive SQL is fun:
       | https://excessivelyadequate.com/posts/sierpinksy.html
        
       | Dowwie wrote:
       | This post is only scratching the surface of what can be achieved
       | with a.. fully armed, and operational battle station.. powered by
       | recursive SQL in postgres. Simply adding a parent_id column to a
       | table (and a few constraints) can facilitate all sorts of useful
       | network topologies. I doubt that the recursive query against a
       | DAG is going to give better performance than a query against a
       | DAG within a graph database, but the relational query may be
       | sufficient for your work and not require the cost of graph db
       | investment.
       | 
       | I think that people need to move beyond simple recursive sql
       | example blog posts and share the more challenging examples.
       | Recursive queries can become challenging quickly.
        
       ___________________________________________________________________
       (page generated 2020-08-17 23:00 UTC)