[HN Gopher] Understanding Recursive Queries in PostgreSQL ___________________________________________________________________ Understanding Recursive Queries in PostgreSQL Author : fdr Score : 93 points Date : 2020-08-17 04:46 UTC (18 hours ago) (HTM) web link (www.cybertec-postgresql.com) (TXT) w3m dump (www.cybertec-postgresql.com) | barrkel wrote: | Recursive queries are better mentally modelled as iterated | queries building up a result set. If you think of recursion as | tracing a tree, or in trivial cases, a linked list, recursive | queries are a breadth-first traversal. | | You better not traverse too deep (iterate too much), or you lose | the data locality benefits that relational databases are tuned | for, and all you end up saving is the cost of round trips to the | database. | | Many tree structures are better modelled as prefix pattern | matching on string paths. Extracting a whole subtree with "path | LIKE 'foo/bar/baz/%'" on an indexed path column is much faster | than following a series of parent links with a recursive query, | more and more so the deeper your subtree gets. This denormalizes | the tree structure into the path column - you can no longer make | structural edits with small updates - but it's much faster for | queries. | elchief wrote: | How does a Recursive CTE run, line by line? | | https://stackoverflow.com/questions/3187850/how-does-a-recur... | koeng wrote: | I use recursive queries in SQL to search persistent suffix arrays | (mainly when there is a LOT OF DATA). You can do string search | hundreds of gigabytes of data within ~0.004s with SQLite on cheap | SSD and pretty much no RAM - my goal is to be able to be able to | search and align on all genetic information we have so far (about | ~9 trillion base pairs in total) | | If anyone knows how to increase INSERT speeds on Postgres, I | would appreciate. I can get about 100K per second with COPY or | putting the inserts in a transaction. Have tried some more | mirrored NVME drives and more RAM with ZFS, as well as dropping | some unimportant indexes, but I'm still a little limited. Any | thoughts? | chrisshroba wrote: | I'm new to postgres and having trouble visualizing the | recursive query you're describing. Would you be willing to | share an example query (or something that illustrates the point | on fake data)? | koeng wrote: | The query that I have working is an abomination (it works | though, https://github.com/Koeng101/poly/blob/polyStore_v0.1/ | store/c...) | | The basic idea is that I am using SQL to back a suffix array | (https://en.wikipedia.org/wiki/Suffix_array). The suffix | array has all suffixes of a string (for "bats", it would have | ["bat", "at", "t"] sorted in alphabetical order ["at", "bat", | "t"] except storing the index of the suffix [1, 0, 2]. | | To search for a string, you go halfway in the suffix array | and check if the suffix is lexicographically equal to, larger | than, or smaller than your input. If it is not equal to, | halve in the right direction. If the string is there, | eventually you'll hit it. | liminal wrote: | Came across this upcoming webinar on the topic: | https://www.timescale.com/webinar/5-ways-to-improve-yourpost... | vii wrote: | Assuming you're doing bulk inserts, and can restart the insert | on any failure, one tactic is to disable all consistency and | durability storage features during the insert. This heavily | reduces the IO operations needed and the need to wait for them | to complete. That's assuming that iops are the resource that | you are constrained on. | | Steps: | | - use ext2 or even tmpfs as the filesystem, this disables | journaling and COW features of ZFS; mount options | async,nobarrier | | - set tables to be UNLOGGED to disable Postgres WAL | | This got things fast enough for my last use-case. But next | ideas, as at some point you then become CPU/memory bound | | - you could try sharding by partitioning the table | | - another trick is to use SQLite as a backing store for inserts | (it is quite fast if you turn off all logging) and then query | with Postgres via a FDW https://github.com/pgspider/sqlite_fdw | jarym wrote: | COPY is the fastest method to insert data in Postgres. As | another poster mentioned, setting table to UNLOGGED will help | also. | | Finally, you can drop ALL indexes and recreate them after | import (been hit and miss in my testing but on occasion has | helped). | mulmen wrote: | Why ZFS? I thought copy-on-write filesystems were discouraged | for database workloads. I know it is possible but are you | getting other benefits from ZFS? | noctune wrote: | For OLTP, COW is horrible. For OLAP it's usually fine. I've | had good results with Postgres and ZFS with LZ4 compression. | site-packages1 wrote: | With inserts you'll see great results from goosing the | transaction array multiplexer. Just make sure you're aware of | piping the commit hash to the properly isolate the down range | effects on the memcpy and you'll avoid corruption issues. This | has worked for me anyways. | fabian2k wrote: | This comment is entirely random gibberish, not sure why | you're posting that. I initially suspected a GPT-3 bot or | something like that, but the comment history didn't support | that. | rarrrrrr wrote: | Is there a reason you don't wish to batch inserts into | transaction(s)? Otherwise, you're asking the database to commit | (fsync to disk) between each insert statement. | | Other ideas: | | - Have multiple processes inserting in parallel (this interacts | with the commit_delay and commit_siblings settings in | postgresql.conf) | | - Use unlogged tables (but table is truncated on db startup) | | - Put fsync=off and synchronous_commit=off in postgresql.conf | (but db may be corrupted in a crash) | | - switching to BRIN indexes might be an option, but depends on | data | koeng wrote: | I do batch the inserts into a large transaction, and usually | optimize for the transaction to commit in approximately ~10s. | Not sure if it is optimal, but I like to monitor it like | that. | | BRIN indexes actually work for some of the data types I'm | working with! Thank you for pointing those out! | fabian2k wrote: | Without benchmarking it, my first instinct would still be to | design the schema in a way to avoid recursive queries, if | possible. I'm thinking of stuff like a simple tree, e.g. | categories with subcategories arbitrarily nested. In that case | I'd probably just store all the parents as well, not just the | immediate category. That is more work in the insert/update case, | but makes queries that include all children trivial. | doersino wrote: | Recursive SQL is fun: | https://excessivelyadequate.com/posts/sierpinksy.html | Dowwie wrote: | This post is only scratching the surface of what can be achieved | with a.. fully armed, and operational battle station.. powered by | recursive SQL in postgres. Simply adding a parent_id column to a | table (and a few constraints) can facilitate all sorts of useful | network topologies. I doubt that the recursive query against a | DAG is going to give better performance than a query against a | DAG within a graph database, but the relational query may be | sufficient for your work and not require the cost of graph db | investment. | | I think that people need to move beyond simple recursive sql | example blog posts and share the more challenging examples. | Recursive queries can become challenging quickly. ___________________________________________________________________ (page generated 2020-08-17 23:00 UTC)