[HN Gopher] Postgres as a graph database
       ___________________________________________________________________
        
       Postgres as a graph database
        
       Author : elorant
       Score  : 378 points
       Date   : 2023-03-31 13:35 UTC (9 hours ago)
        
 (HTM) web link (www.dylanpaulus.com)
 (TXT) w3m dump (www.dylanpaulus.com)
        
       | mdavidn wrote:
       | I do love PostgreSQL, and I often reach for this approach when
       | working with hierarchical data, but a word of warning: Recursive
       | queries will not scale to handle _very_ large edge tables. The
       | recursive query will always consider _all_ edges, even when the
       | outer query seeks only a single node's relationships. The
       | solution is to build and index denormalized n-th order
       | relationship tables.
       | 
       | Others have already pointed out the issues of cycles.
        
         | Nezteb wrote:
         | I learned from one of the comments on the post about AGE[1], "a
         | PostgreSQL extension that provides graph database
         | functionality."
         | 
         | [1] https://age.apache.org/
        
         | afandian wrote:
         | I'm sorry could you spell it out? What exactly does "recursive
         | query will always consider _all_ edges" mean? A table scan? I'd
         | be very grateful if you could give some pesudocode or point to
         | a doc page.
        
           | jeremyjh wrote:
           | I think GP means that it has to completely expand the
           | recursive part for every branch before any where condition on
           | the edge nodes can be applied. Graph databases presumably can
           | optimize this.
           | 
           | I've found recursive queries to be difficult to scale in
           | real-world queries past a few million edge nodes. We've
           | denormalized several tree relationships so that the edge is
           | connected both to its parent and to the root of its tree in
           | several cases.
        
             | afandian wrote:
             | Thanks! Sounds like you're saying that brute-force
             | recursive algorithms without backtracking or early
             | termination aren't a good match for path finding. That's
             | not a surprise.
             | 
             | I'm using recursive algorithm on Postgres to find trees in
             | a graph, but only where I'm specificaly interested in all
             | of the nodes.
        
         | cryptonector wrote:
         | Cycles are not a problem: just use `UNION`, not `UNION ALL`.
         | 
         | I myself build transitive closure tables that I update
         | incrementally as needed (sometimes in a deferred way), so that
         | many of the recursive queries I do can be very fast, but I only
         | build transitive closures for the smallest portion I can
         | (basically group nesting).
        
         | liotier wrote:
         | > Recursive queries will not scale to handle _very_ large edge
         | tables
         | 
         | What do you consider large ? We have a 12-year old telco
         | application with a few million objects in Neo4J, doing fault
         | impact calculations... Would PostgreSQL handle that easily
         | nowadays ?
        
           | simonw wrote:
           | My hunch is that a few million objects is pretty tiny these
           | days - you could probably write a script to import all of
           | them into PostgreSQL in a few minutes and try it out yourself
           | to see how it behaves.
        
             | ellisv wrote:
             | It depends a lot on how connected the graph is and whether
             | you can fit it in memory.
        
             | simonw wrote:
             | I tried it against SQLite. I got ChatGPT to write me a
             | script that would insert 1m edges and 1m nodes:
             | 
             | https://gist.github.com/simonw/c16ce01244760e186a3a0aa3fee0
             | 4...
             | 
             | Then I ran that query again and it seems to return results
             | in about 80ms:
             | 
             | https://lite.datasette.io/?sql=https://gist.github.com/simo
             | n...
        
               | eurasiantiger wrote:
               | Not very realistic example, you need to be requesting
               | some actual fields across nodes and doing some filtering
               | on at least strings and dates, maybe geo areas as well.
        
               | simonw wrote:
               | Go ahead and try it! I've documented all of the tools you
               | need to run some detailed experiments against SQLite
               | entirely in your browser here.
        
         | ellisv wrote:
         | It's not even that it always considers all edges, but that you
         | can't exit the search early when a condition is met. In other
         | words, you have to first find all the paths and then, outside
         | the CTE, filter to the shortest. We push the node filter into
         | the CTE by wrapping it in a function.
         | 
         | > The solution is to build and index denormalized n-th order
         | relationship tables.
         | 
         | This sounds much more performant but also more difficult to
         | maintain.
        
           | switchbak wrote:
           | "you can't exit the search early when a condition is met" - I
           | have a DAG traversal written in a recursive CTE, and I can
           | bail early out just fine when my traversal predicates no
           | longer match. Not sure why I'd have to do that outside the
           | (recursive part of the) CTE?
           | 
           | Obviously maintaining a flattened version is going to perform
           | better in queries, but you're trading maintenance (write)
           | time for query time. Or materialization time if you decide to
           | do it lazily.
        
             | ellisv wrote:
             | If you have an edge list like:
             | 
             | A->B
             | 
             | A->D
             | 
             | B->C
             | 
             | C->D
             | 
             | Postgres will walk both paths from A to D in the recursive
             | CTE and then you can filter them afterwards to keep only
             | the shortest.
             | 
             | You can use aggregate functions within the recursive CTE,
             | so you can't GROUP BY your starting node and stop once you
             | find the first path. There isn't a way to compare across
             | multiple paths or iterations of the for-loop.
        
               | FleaFlicker99 wrote:
               | Ok but that's a little different than saying you can't
               | cut the traversal short.
               | 
               | If I'm traversing ancestors (let's say) until I find one
               | that doesn't satisfy a condition, it'll bail out then. I
               | get that this doesn't serve all uses cases, but it isn't
               | a small thing either.
        
           | mdavidn wrote:
           | Yes, it is difficult to maintain. That might be a good moment
           | to consider a proper graph database!
        
             | ellisv wrote:
             | I've considered using a column to store indicate 2nd, 3rd,
             | etc degree relations instead of n-th order tables.
        
       | YouWhy wrote:
       | While the basic technique is sound, SQL recursive queries' no-
       | side-effects character means a huge number of basic graph
       | traversal strategies (e.g.: pruning, shortest path
       | first/Dijkstra) are beyond reasonable engineering.
       | 
       | In addition to that, Postgres is difficult as a computational
       | platform - the execution strategy for recursive queries is not
       | necessarily intuitive, and their performance is hard to assess
       | (for example: how soon will you run out of "stack"?)
        
       | cpa wrote:
       | It's a good tool as long as your queries are simple, and your
       | graph isn't too large (which is probably the case 99% of the
       | time). And if that allows you to have one less tool to deal with,
       | go for it!
       | 
       | But I've tried to push it to its limits (using PostGIS, my nodes
       | being land plots that were connected to each others -- the
       | biggest query used joins, lateral, recCTE, windows... you name
       | it) and ran into many issues: can't branch out early, hard to
       | optimize, no support for complex queries (writing anything beyond
       | BFS and DFS is a pita). Yet, in the end I accepted the struggle
       | (and the long queries) because it was so nice to work with my
       | data directly where it was kept :)
        
         | amyres wrote:
         | Using the built-in PostGIS topology tools? Or something more
         | custom? I'm curious as I just started digging into and using
         | PostGIS for a land parcel use case. I've wondered about the
         | graph structure of parcels adjacent to others, rather than just
         | a big unordered table of geom objects.
        
           | cpa wrote:
           | Just using the geography stuff and doing joins on
           | ST_Intersects. I couldn't guarantee that my data formed a
           | partition of the plane, which is necessary for topology iirc.
           | 
           | Fun was had and headaches too. The biggest speedup I got was
           | computing the graph online (creating a node/vertices tables
           | from geometries) and then doing the recursive CTE on that
           | table.
        
             | amyres wrote:
             | Cool thanks! I'm amazed at how much can be done in PostGIS
             | (if my SQL queries are good enough...)
        
       | Rapzid wrote:
       | I've found this post pretty inspiring for what can be done with
       | graphs in Postgres: https://www.alibabacloud.com/blog/postgresql-
       | graph-search-pr...
       | 
       | There are a LOT of use cases that will fit the capabilities and
       | performance characteristics.
        
       | stevebmark wrote:
       | Is with_recursive nearly as performant as dedicated graph
       | database query languages?
        
       | vrglvrglvrgl wrote:
       | [dead]
        
       | crunchengine wrote:
       | I use EdgeDB for graphs, it works well enough, I just don't know
       | about performances but for sure it is sufficient for most use-
       | cases.
        
         | e1g wrote:
         | EdgeDB is Postgres so the same constraints apply.
        
           | pcthrowaway wrote:
           | I imagine that's why the person you're responding to
           | mentioned it. I'm surprised the author of this article didn't
        
       | claytongulick wrote:
       | It doesn't handle all cases or all queries, but one method I've
       | used to deal with graph data in relational databases it to
       | materialize paths and cache them on the node (this only really
       | works well with DAGs).
       | 
       | So, the path key could look something like: bob.ted.lisa.bill
       | (obviously, use ids instead of names here)
       | 
       | Then you can index this in various ways to make LIKE queries
       | fast.
       | 
       | Want do know anyone that rolls up to ted?
       | select * from blah where path like 'bob.ted%'
       | 
       | Or, how many people report to ted, but aren't under lisa?
       | select count(*) from blah where path like 'bob.ted%' and path not
       | like 'bob.ted.lisa%'
       | 
       | And something a bit more complex that would typically be
       | considered the domain of a graph database, like "What are the
       | average number of direct reports each manager in my organization
       | has?"                   select name, avg((select count(*) from
       | blah as sub where path like  '%' || name || '%' and
       | (array_position(string_to_array(path,'.'), name) =
       | ((array_position(string_to_array(path,'.'), sub.name) - 1))
       | 
       | So, even with that degree of complexity, you're still avoiding
       | recursion, though you might be getting into O(n^2) territory if
       | you don't have an index that will handle '%' || ... || '%' well.
       | You can expand on this with various techniques depending on the
       | application.
       | 
       | It's certainly no replacement for a graph database, but for basic
       | stuff it's very fast, handles huge volumes, and doesn't rely on
       | any sort of recursion at the database layer (unless you
       | materialize in a trigger or something).
       | 
       | The actual materialization code can be tricky, one technique to
       | deal with that is to use the author's technique or something
       | similar to have a recursive CTE in a view. When a graph change is
       | made, you can scope the change based on the path's hierarchy and
       | update the paths from the recursive view, that way you only hit
       | the performance penalty at write time, not read.
        
       | szarnyasg wrote:
       | I designed and maintain several graph benchmarks in the Linked
       | Data Benchmark Council, including workloads aimed for databases
       | [1]. We make no restrictions on implementations, they can any
       | query language like Cypher, SQL, etc.
       | 
       | In our last benchmark aimed at analytical systems [2], we found
       | that SQL queries using WITH RECURSIVE _can_ work for expressing
       | reachability and even weighted shortest path queries. However,
       | formulating an efficient algorithm yields very complex SQL
       | queries [3] and their execution requires a system with a
       | sophisticated optimizer such as Umbra developed at TU Munich [4].
       | Industry SQL systems are not yet at this level but they may
       | attain that sometime in the future.
       | 
       | Another direction to include graph queries in SQL is the upcoming
       | SQL/PGQ (Property Graph Queries) extension. I'm involved in a
       | project at CWI Amsterdam to incorporate this language into DuckDB
       | [5].
       | 
       | [1] https://ldbcouncil.org/benchmarks/snb/
       | 
       | [2] https://www.vldb.org/pvldb/vol16/p877-szarnyas.pdf
       | 
       | [3]
       | https://github.com/ldbc/ldbc_snb_bi/blob/main/umbra/queries/...
       | 
       | [4] https://umbra-db.com/
       | 
       | [5] https://www.cidrdb.org/cidr2023/slides/p66-wolde-slides.pdf
        
         | eternalban wrote:
         | Re [5]'s asssertion under "blunders" of the diminish usecases
         | post sql/pgq, what do you think of [something] like Quine?
         | 
         | https://github.com/thatdot/quine
         | 
         | Their claim to fame is progressive incremental computation -
         | each node is an actor responding to events -- and I'm not sure
         | how a relational db could do that and match the latencies. That
         | usecase is pretty much pattern matching and forensics and stuff
         | like that.
         | 
         | https://docs.quine.io/core-concepts/architecture.html
        
       | hot_gril wrote:
       | Postgres is powerful. I've implemented graph DB, mapreduce, huge
       | sparse matrix math (which was cool), and ofc a KV store with it.
       | With actual performance benefits over the alternatives. It's not
       | very ergonomic for those, but it works.
       | 
       | That said, I've had so many jobs where the team decides to fit
       | our whole data model into a fully recursive graph, and it's been
       | a huge mistake every time regardless of whether they use Postgres
       | or a dedicated graph DB. Just because you can do it (you always
       | can!) doesn't mean you should. Start with just a traditional
       | n-hierarchy schema and only look at modeling things as a graph if
       | you're sure it's necessary. Usually it's not. Sometimes you'll
       | have a mostly non-graph schema with a couple of graph tables for
       | the things that really need that.
        
       | ape4 wrote:
       | Postgres has commands to easily list conventional tables nicely.
       | While I can see this would work, there are no native Postgres
       | commands to list the graph.
        
       | jzelinskie wrote:
       | We used the "WITH RECURSIVE" basically the moment it shipped in
       | Postgres for the original CoreOS Clair[0] data model which was a
       | graph of software versions and vulnerabilities. It scaled well
       | compared to a naive graph abstraction implemented outside the
       | database, but when performance wasn't great, it REALLY wasn't
       | great. After running that version in production for a couple
       | years, we ended up throwing it out and remodeling the domain to
       | be less graph-y to achieve more consistent performance.
       | 
       | I've since worked on SpiceDB[1] which scales much better by
       | taking the traditional design approach for graph databases and
       | only treating Postgres as triple-store. IME, there's no short-
       | cut: if you need a graph, you probably want to use a database
       | optimized for graph access patterns. Most general-purpose graph
       | databases are full of optimizations for common traversals that
       | are uncommon operations in relation databases.
       | 
       | [0]: https://github.com/quay/clair
       | 
       | [1]: https://github.com/authzed/spicedb
        
       | yooloo wrote:
       | Of course relational db can act like a graph db. It's just not as
       | efficient due to how things are stored and queried. Would be
       | great to have a graph db plugin (and I found one
       | https://github.com/apache/age)
        
         | piyh wrote:
         | Having first class pagerank and shortest path functions in
         | tinkerpop gremlin vs having to roll it yourself with recursive
         | CTEs feels like a graph DB win.
        
       | sigstoat wrote:
       | any suggestions for getting nice graph db-style functionality out
       | of an postgres schema, where you've got the nodes and edges of a
       | labelled property graph spread across a bunch of different
       | tables? (AGE seems to want to create a separate graph object that
       | lives alongside your other tables, and toy examples like this
       | article just dump everything into special tables)
       | 
       | in my particular case i don't actually have a lot of data so i'm
       | not looking for performance, just want to save developer time.
        
       | ghotli wrote:
       | I like this idea and ultimately what I want is "is this a DAG or
       | not" and if so "give me the topological sort".
       | 
       | Anyone know of good docs for this sort of pattern with SQL that
       | addresses those two common tasks?
        
       | varunjain99 wrote:
       | This is a neat data modeling exercise, but not sure why you'd
       | prefer this approach over Neo4j.
       | 
       | It's much simpler to query in native graph database land (though
       | it's not exactly SQL, it's pretty close to be honest). And there
       | are performance / reliability implications of forcing postgres
       | into graph land.
        
       | piyh wrote:
       | This isn't just a PG thing. The wiki article isn't great, but the
       | references have links to sqlite, mssql, oracle, ibm, etc.
       | 
       | https://en.wikipedia.org/wiki/Hierarchical_and_recursive_que...
        
       | simonw wrote:
       | This same technique - WITH RECURSIVE - is also supported by
       | SQLite.
       | 
       | I ported the example in this tutorial to SQLite here - now you
       | can play with it in Datasette Lite:
       | https://lite.datasette.io/?sql=https%3A%2F%2Fgist.githubuser...
       | 
       | Amusingly I ported the sample PostgreSQL code using the new
       | ChatGPT "Browsing" alpha, with the following prompt:
       | 
       | > Read this tutorial and then output equivalent create table and
       | insert statements for SQLite
       | https://www.dylanpaulus.com/posts/postgres-is-a-graph-databa...
        
         | punnerud wrote:
         | If we have links to other Sqlite3 databases in the SQLite
         | itself, we could make an static webpage distributed graph
         | database: https://news.ycombinator.com/item?id=27016630
         | 
         | With the static database trick, combined with recursive graphs
        
       | mharig wrote:
       | IIRC, the maybe biggest graph database in the world, TAO (The
       | Associations and Objects ?) of FB, is based on MySQL.
       | 
       | Meta may have had a good reason not to choose a 'native' graph
       | DB.
        
       | its-summertime wrote:
       | Handling cycles seems to be a missing point.
       | 
       | https://www.postgresql.org/docs/current/queries-with.html#QU...
       | the docs themselves do have information about it.
        
         | ellisv wrote:
         | Postgres can detect cycles; see 7.8.2.2 in the docs:
         | https://www.postgresql.org/docs/current/queries-with.html#QU...
        
       | jakewins wrote:
       | Johan wrote the original Neo4j implementation as a graph layer on
       | top of Postgres, random trivia. Then Java released NIO, he got
       | excited and tried replacing the PG engine with one built for
       | graphs from scratch.
        
       | kaba0 wrote:
       | Only remotely relevant, but I have recently come across Gremlin,
       | a sort of "SQL for graphs". Does anyone have some experience with
       | it they can share? I only used it on an in-memory graph database,
       | and I found it quite great so far.
        
         | inguz wrote:
         | Tinkerpop/Gremlin is not really SQL-like at all. My experience
         | with it has not been fun: it's difficult to reason about, has
         | obscenely bad documentation, poor tooling, idiosyncratic
         | library support, and simple data access patterns are hard.
        
       | user00012-ab wrote:
       | whoa.... postgres is moving quickly and taking jobs away from
       | other databases. Maybe we need to hold up for 6 months on any new
       | uses for postgres until we all have some time to figure this out.
        
       | truth_seeker wrote:
       | Few suggestions based on my prior experience on the same task in
       | production system:-
       | 
       | 1. Both the tables (Nodes and Edges) MUST HAVE one more column as
       | "label" and then you must create partitions based on distinct
       | values of "label" column. This greatly helps queries to run
       | faster as search plan is narrowed when you include it in where
       | clause and PG knows partitions to hit.
       | 
       | 2. Instead of one big fat primary key column in each table use,
       | consider having different uniques indexes on each partition.
       | 
       | 3. The EDGES table can afford to have "data" column of data type
       | TEXT or perhaps JSONB. PG saves it in different data disk layout
       | and compression works great here. We used it to cache the result
       | for previous/next nodes data or compute the result for many extra
       | hops which are required on frequent basis.
       | 
       | 4. Use PG procedures to hide the complexity of reusable DFS/BFS
       | queries.
        
       | yayr wrote:
       | Can someone explain or point towards an explanation, what the
       | performance impact of such a query is? How does it scale with
       | number of edges or depth of recursion? Is there a special smart
       | way of indexing this to be efficient?
        
         | henryfjordan wrote:
         | If you store a list of edges like (startId, endId) with an
         | index on startId, then every time you want to follow an edge
         | will cost O(log(n)) where n is the number of edges in your
         | graph.
         | 
         | You can get O(1) performance on similar queries using a
         | different storage strategy (by storing edges as pointers to the
         | other nodes in memory) but SQL is not the right tool for that,
         | you would be better off with a true graph DB.
        
         | winrid wrote:
         | If you have indexes on next_node and previous_node, it'll
         | preform okay.
         | 
         | The size of the table (# of edges) won't actually matter *that*
         | much performance wise with btree indexes. The performance will
         | mostly depend on how deep you want to go with your recursion.
         | The more nodes you check per query, the slower it'll be.
         | Luckily PG will keep the frequently accessed nodes and index
         | pages in the buffer pool, so it'll perform close enough to an
         | in-memory graph database for most use cases that access the
         | same data often - while not requiring your whole dataset to fit
         | into memory all the time.
         | 
         | MongoDB's current graph search mechanism works the same way.
         | 
         | I think actually if you are using incremental ids and you don't
         | delete edges, you could try BRIN indexes for the next/previous
         | node columns. This could still get you pretty good performance
         | with _much_ lower storage space requirements than a
         | "traditional" index, so you could theoretically store a pretty
         | large graph (thinking billions of edges) with minimal disk
         | space.
        
       | ForHackernews wrote:
       | I can't believe this article doesn't mention the (first-party)
       | ltree extension: https://hoverbear.org/blog/postgresql-
       | hierarchical-structure...
        
       | fbn79 wrote:
       | Is not nested sets https://en.wikipedia.org/wiki/Nested_set_model
       | a better solution than the one described in the article?
        
       | elchief wrote:
       | Joe Celko covers graphs in chapters 12-14 of Trees and
       | Hierarchies in SQL for Smarties
       | 
       | as well as chapter 27 of SQL for Smarties, 5th
       | 
       | isn't ISO working on an graph extension to regular SQL? Can't
       | seem to google it correctly
        
       | garyclarke27 wrote:
       | In my experience Postgres works really with a Recursive CTE
       | inside a Materialized View for Graphs. Latest version can detect
       | cycles. I used to add a distance column as well, to show number
       | of hops apart, any 2 nodes are. Hoping Postgres adds Automatic
       | Incremental Mat View Maintenance soon, it will be perfect then as
       | a Graph Database.
        
         | TOMDM wrote:
         | Incremental materialised view maintenance is a killer feature.
         | 
         | If Postgres added it, almost all of my interest in Materialize
         | (the database, not the technique) would vanish immediately.
        
       | dfragnito wrote:
       | We accomplish similar with SchemafreeSQL. Currently MySQL 8 back-
       | end but working on other SQL back-ends. This demo uses
       | PlanetScale as the DB back-end, Fly.io handles the SFSQL
       | endpoint, Netlify function handles the query, and cytoscape.js
       | graph network library for the UI
       | 
       | https://harmonious-mermaid-c4d794.netlify.app/got (sorry not
       | mobile friendly but functional)
        
         | samlambert wrote:
         | This is a cool project.
        
           | dfragnito wrote:
           | Thanks !! We had hoped to use PS for our serverless backend
           | and also offer a dedicated PS option. But we ran into some
           | issues with Vitess. I was just thinking today I wanted to get
           | back into it and see if we can resolve these issues. We had
           | to go with Aurora.
           | 
           | The issues were on deletes. This demo is read only so it
           | works well with a PS backend. My experience with your CS has
           | been amazing despite these issues!!
        
       | _boffin_ wrote:
       | Again, i'll post
       | https://books.google.com/books/about/Joe_Celko_s_Trees_and_H...
       | 
       | pdf:
       | https://github.com/Gumplefik/pdf/blob/master/Joe%20Celko's%2...
        
         | packetslave wrote:
         | we're posting copyrighted PDFs to HN now?
        
       | jghn wrote:
       | Over time I've come to take this a step further. I feel people
       | reach for a graph database way too early. Most use cases that
       | most people have will work fine in a standard relational setup.
       | And that brings with it the benefits of using technologies like
       | Postgres that have stood the test of time. So I treat it as I do
       | any performance consideration: first demonstrate that you
       | actually need to go down the path less traveled, don't just
       | assume you need to do so.
        
         | rajman187 wrote:
         | Completely agree. In fact, even massive graph data can make use
         | of a relational database + caching [1]
         | 
         | [1] https://engineering.fb.com/2013/06/25/core-data/tao-the-
         | powe...
        
       | cryptonector wrote:
       | The reason graph DBs exist is mainly to provide easier syntax for
       | recursive queries (which aren't really recursive because they're
       | tail-recursive, which means iterative rather than what most
       | people think of as recursive).
       | 
       | Any questions?
       | 
       | It's not clear how one might make SQL syntax for recursive
       | queries simpler. Clearly we can have syntactic sugar for
       | "dereferencing" relationships, so one could say `SELECT
       | child-->name FROM parent_child WHERE ...;`, and we could have
       | aggregation `SELECT array_agg(child-->name) FROM parent_child
       | WHERE ...;`, and `SELECT child--->name FROM parent_child WHERE
       | ...;` to say "recurse all the way through the child relation",
       | but if you want more than one column then things get tricky.
        
       | dapearce wrote:
       | Also see Apache Age, a graph database extension for Postgres:
       | https://github.com/apache/age
        
       | philzook wrote:
       | Very interesting. Relatedly, I've been refining simple
       | lightweight ways to use a regular SQL database to execute
       | seminaive datalog queries. Path reachability in a graph is the
       | canonical example. One could use stored functions to implement
       | the outer loop also.
       | 
       | There are some other fun related applications like graph
       | rewriting using SQL, etc.
       | 
       | MiniLitelog: Easy Breezy SQLite Datalog -
       | https://www.philipzucker.com/tiny-sqlite-datalog/
        
       | ChicagoDave wrote:
       | As a C# guy, I've figured out how to build in-memory graphs with
       | generics and dsl/fluid queries. I'm working on a blog entry about
       | it because it's such a powerful skill to learn and leverage. No
       | data store required. I can deserialize/serialize the data
       | structure to/from binary or json.
        
       | ralusek wrote:
       | I've done this on many occasions, but I actually only go with a
       | single nodes table.
       | 
       | Schema is like this                   id         type: string (or
       | enum)         data: jsonb         indexed: jsonb (but this one
       | gets a GIN index)         toId: nullable foreign key to this same
       | table         fromId: nullable foreign key to this same table
       | createdAt: date         updatedAt: date
       | 
       | So if I had a post with a comment on it, the data would look
       | something like this                  {          id: '1111',
       | type: 'users',          data: { name: 'John Doe' },        }
       | {          id: '2222',          type: 'users',          data: {
       | name: 'Jane Doe' },        }             {          id: '3333',
       | type: 'posts',          data: { text: 'Nice day today' },
       | }             {          id: '4444',          type: 'comments',
       | data: { text: 'For you, maybe. Raining here' },        }
       | 
       | And now for the edges:                   {           id: '5555',
       | type: '(posts < users)',           toId: '3333',
       | fromId: '1111',         }              {           id: '6666',
       | type: '(comments < users)',           toId: '4444',
       | fromId: '2222',         }              {           id: '7777',
       | type: '(posts < comments)',           toId: '3333',
       | fromId: '4444',         }
       | 
       | So then, for example, if I have a post, and I want to find the
       | comments on it, I search for toId = <my post id>, type = '(posts
       | < comments)'.
        
         | TOMDM wrote:
         | This works well for nodes with at most 1 parent and child, the
         | post is aimed at supporting many to many relationships though
        
           | ralusek wrote:
           | What I've described here supports many to many...
        
             | fauigerzigerk wrote:
             | Yes, but the downside of your schema is that the
             | distinction between nodes and edges is somewhat muddled.
             | 
             | Presumably, all nodes would have nulls in both fromId and
             | toId, but the schema doesn't enforce that. The schema also
             | allows linking edges to other edges.
             | 
             | Don't you think this is a bit too much flexibility if the
             | intention is to model a graph?
        
       | Dopameaner wrote:
       | Andy Pavlov, mentioned something similar. Where, he advocated
       | Relational database to solve most of the graph database
       | utilities. He also made a bet, if graph databases spawn a legit
       | usecase by 2030, that he would change his CMU profile picture.
        
       | henryfjordan wrote:
       | I implemented pretty much exactly this setup once.
       | 
       | For what it's worth, there's a lot of footguns with this
       | approach. You need to be careful about infinite cycles and things
       | like that. You also just do not get the ergonomics of using a
       | query language that is meant for graph data. Once you get it all
       | setup and experience the issues you will no doubt be able to
       | build whatever you want but there's a learning curve.
       | 
       | In the end we switched to using Neo4j and were able to build new
       | features a lot more quickly than we would've been able to on
       | Postgres.
       | 
       | It's also worth mentioning that there are many ways to store
       | graph data, and using an "adjacency list" like is being done here
       | may or may not be the best way for your use case. Neo4j I think
       | uses a different approach where nodes store info about edges they
       | are connected to directly, so you don't even need to hit an index
       | to follow an edge.
        
         | mistermann wrote:
         | Piggybacking on your Neo4J reference....can anyone suggest any
         | resources that would be good for someone from the SQL world who
         | is very uninformed on graph databases to quickly get their head
         | around the key ideas, but even more importantly, how to choose
         | a platform among the offerings (licensing being I think a key
         | issue, I've heard Neo4J can get expensive)...assume a large,
         | "social media" scale system, to be safe.
        
           | viksit wrote:
           | you can use datomic as a good example.
           | https://hashrocket.com/blog/posts/using-datomic-as-a-
           | graph-d...
           | 
           | another good example using predicates -
           | https://github.com/threatgrid/asami/wiki/2.-Introduction
        
           | pottertheotter wrote:
           | I was looking at graph databases/Neo4j a lot early last year
           | and found Neo4j has a lot of good resources.
           | 
           | You can get this book for free through their website. [1]
           | 
           | This have some good documentation[2], including this page
           | "Transition from relational to graph database".[3]
           | 
           | [1] https://neo4j.com/graph-databases-book/
           | 
           | [2] https://neo4j.com/docs/getting-started/current/
           | 
           | [3] https://neo4j.com/docs/getting-
           | started/current/appendix/grap...
        
           | henryfjordan wrote:
           | Neo4j has a self-hosted community edition that is free,
           | though you need to update to Enterprise to expand your DB
           | beyond one node. It's worth noting though that scaling from a
           | single node to a cluster is going to have some pretty huge
           | performance issues whether you are using SQL or Neo4j to
           | store your graph. The power of graph databases is that it
           | should be very inexpensive to follow an edge in the graph but
           | when that leads to a network hop you lose a lot of
           | performance.
           | 
           | If you are comfortable fitting your data onto one box, then
           | development speed is probably more important than other
           | factors and I would just try a few databases out and
           | especially pay attention to how good the libraries are in
           | your programming language of choice. Neo4j for instance had
           | high quality libraries in Java but the experience in Python
           | was not as good.
           | 
           | If you have a lot of data or need next-level performance, I
           | would start by doing some research on the various ways to
           | store graph data and then pick a DB that supports the
           | approach you want to take.
        
           | joshspankit wrote:
           | As someone who hated being in the SQL world and couldn't
           | figure out why until years later when I found out about graph
           | databases, here's one big shift:
           | 
           | Graph database store true relationships.
           | 
           | The R in RDBMS and the concept of relationships by primary
           | keys are lies (imo). They lead people away from what true
           | relationships can be. Databases like Neo4j are not about
           | doing any sort of PK or join or merge-before-query. Looking
           | up by connection is the _fastest_ way to find information. If
           | your RDBMS had one table for phone numbers, one for email
           | addresses, one for first names, one for last names, one for
           | street, and so-on it would take huge amounts of time to query
           | to find all the detail for a single person. With a graph
           | database it takes a small amount of time to find the first
           | bit of info (let's say phone number since it's easily
           | sorted), and then every other bit of linked data is
           | essentially free.
        
             | eurasiantiger wrote:
             | Effectively, moving from RDB to a graph database just
             | shifts relationship resolution from query time to
             | insert/update time. Sometimes this can be an advantage. I'd
             | even wager usually.
        
           | taubek wrote:
           | For some basics of graph modeling take a look at
           | https://memgraph.com/docs/memgraph/tutorials/graph-modeling.
           | 
           | Some differences between SQL and graph databases are covered
           | at https://memgraph.com/blog/graph-database-vs-relational-
           | datab....
        
         | cryptonector wrote:
         | > You need to be careful about infinite cycles and things like
         | that.
         | 
         | Not really, just never forget this: always use `UNION`, _and
         | never_ `UNION ALL`, in recursive queries!
        
       | revskill wrote:
       | PostgreSQL is my love. Thanks for all developers.
       | 
       | Do not blame the tool if you feel it doesn't fit your niche.
       | Blame yourself first, and try to use it the right way.
        
       | amir734jj wrote:
       | Good example. Now do it with undirected graph. How will the query
       | look like?
       | 
       | So, is this basically running transitive closure on the database
       | during query? it would be expensive.
        
       | pphysch wrote:
       | Small nitpick but I wish the author just made `data` a JSONB
       | field rather than VARCHAR. That really shows the power of
       | Postgres, being able to operate as a "NoSQL graph DB" with little
       | to no hacking.
        
         | ganderzz wrote:
         | Hey, author here! Thanks for the feedback! I went back and
         | forth with having `data` being a JSONB field or VARCHAR, but
         | ended up with VARCHAR to show that `nodes` don't have to be
         | anything crazy. Really `nodes` is just a standard table you'd
         | see in any old SQL database, and what makes it a "graph" is the
         | `edges`/relationship table.
        
       | bottlepalm wrote:
       | I can see how this possibly won't scale compared to dedicated
       | graph dbs.
       | 
       | I'm interested in what the fundamental difference is in those
       | graph dbs because rdbms seem so close I don't see why it isn't
       | just provided as an extra layer on top.
       | 
       | As described in the article, any graph can already be built with
       | an rbms, so why for example does neo4j need to be standalone and
       | can't possibly be a layer on top of postgres to leverage existing
       | relationships? Any good articles on that?
        
         | emodendroket wrote:
         | I don't know off the top of my head but I'd assume that one
         | could adjust the storage strategy and other performance
         | tweaking to optimize for the expected use case.
        
       | samwillis wrote:
       | If you are looking to utilise this with Django, there is the
       | brilliant Django CTE project that enables it and other CTE
       | queries with the Django ORM:
       | 
       | https://dimagi.github.io/django-cte/#recursive-common-table-...
        
       | taubek wrote:
       | I personally prefer native graph databases such as Memgraph[1].
       | Graph databases have it's advantages and disadvantages. Sometimes
       | you, need RDBMS, sometimes you need NoSQL solution. So I pick my
       | stack based on situation and issue that I need to solve.
       | 
       | [1] https://memgraph.com
        
         | rajman187 wrote:
         | "Native" is a misleading or misused term in the context of
         | graph databases because a graph cannot be mapped in a maximally
         | consistent way with the underlying hardware (the von Neumann
         | architecture represents data in a sequential manner and it is
         | much faster to access it this way rather than randomly).
         | 
         | There are generally two families in the graph database world:
         | those which use underlying traditional tables of nodes and
         | many-to-many edges; and index-free adjacency which just means
         | each node in the graph knows the memory address of its
         | connections (other side of the edges).
         | 
         | Distributed graphs necessarily end up using the former because
         | it's difficult if not impossible for a node to know the memory
         | address of its connection when that crosses a physical
         | boundary. So typically index-free adjacency graphs have a
         | master-slave setup with multiple read replicas but a single one
         | to write to.
         | 
         | So with a "native graph" you don't rely on potentially
         | expensive join operations to find neighbors of neighbors and
         | can traverse complex paths easily.
        
           | dtomicevic wrote:
           | This is spot on. Though, in Memgraph's context, there is a
           | combination of lock free techniques and raw pointer
           | representation that can work exceptionally well for a large
           | variety of use cases, and even better so compared to others
           | on streaming / event driven / large volume of writes. Using
           | raw pointers, there are also optimization opportunities when
           | storing or rebalancing the graph to maximize the number of
           | adjacent nodes in a cache line so benefiting from sequential
           | representation too.
        
         | mashroom wrote:
         | Thanks for pointing me toward Memgraph. I didn't hear of it
         | before and it appears to be a great alternative to neo4j.
        
       | nilsbunger wrote:
       | Sure, at its most basic a graph database is just a bunch of
       | many2many relationships.
       | 
       | The recursive query is cool, but how is the performance for graph
       | queries compared to a DB optimized for this use case ?
        
         | ellisv wrote:
         | I haven't compared, but something like Kuzu is probably much
         | more performant for a lot of use cases because it implements
         | Worst-Case Optimal Joins.
         | 
         | But I think Postgres is likely to get these improvements in a
         | few years too.
        
       ___________________________________________________________________
       (page generated 2023-03-31 23:00 UTC)