[HN Gopher] Postgres as a graph database ___________________________________________________________________ Postgres as a graph database Author : elorant Score : 378 points Date : 2023-03-31 13:35 UTC (9 hours ago) (HTM) web link (www.dylanpaulus.com) (TXT) w3m dump (www.dylanpaulus.com) | mdavidn wrote: | I do love PostgreSQL, and I often reach for this approach when | working with hierarchical data, but a word of warning: Recursive | queries will not scale to handle _very_ large edge tables. The | recursive query will always consider _all_ edges, even when the | outer query seeks only a single node's relationships. The | solution is to build and index denormalized n-th order | relationship tables. | | Others have already pointed out the issues of cycles. | Nezteb wrote: | I learned from one of the comments on the post about AGE[1], "a | PostgreSQL extension that provides graph database | functionality." | | [1] https://age.apache.org/ | afandian wrote: | I'm sorry could you spell it out? What exactly does "recursive | query will always consider _all_ edges" mean? A table scan? I'd | be very grateful if you could give some pesudocode or point to | a doc page. | jeremyjh wrote: | I think GP means that it has to completely expand the | recursive part for every branch before any where condition on | the edge nodes can be applied. Graph databases presumably can | optimize this. | | I've found recursive queries to be difficult to scale in | real-world queries past a few million edge nodes. We've | denormalized several tree relationships so that the edge is | connected both to its parent and to the root of its tree in | several cases. | afandian wrote: | Thanks! Sounds like you're saying that brute-force | recursive algorithms without backtracking or early | termination aren't a good match for path finding. That's | not a surprise. | | I'm using recursive algorithm on Postgres to find trees in | a graph, but only where I'm specificaly interested in all | of the nodes. | cryptonector wrote: | Cycles are not a problem: just use `UNION`, not `UNION ALL`. | | I myself build transitive closure tables that I update | incrementally as needed (sometimes in a deferred way), so that | many of the recursive queries I do can be very fast, but I only | build transitive closures for the smallest portion I can | (basically group nesting). | liotier wrote: | > Recursive queries will not scale to handle _very_ large edge | tables | | What do you consider large ? We have a 12-year old telco | application with a few million objects in Neo4J, doing fault | impact calculations... Would PostgreSQL handle that easily | nowadays ? | simonw wrote: | My hunch is that a few million objects is pretty tiny these | days - you could probably write a script to import all of | them into PostgreSQL in a few minutes and try it out yourself | to see how it behaves. | ellisv wrote: | It depends a lot on how connected the graph is and whether | you can fit it in memory. | simonw wrote: | I tried it against SQLite. I got ChatGPT to write me a | script that would insert 1m edges and 1m nodes: | | https://gist.github.com/simonw/c16ce01244760e186a3a0aa3fee0 | 4... | | Then I ran that query again and it seems to return results | in about 80ms: | | https://lite.datasette.io/?sql=https://gist.github.com/simo | n... | eurasiantiger wrote: | Not very realistic example, you need to be requesting | some actual fields across nodes and doing some filtering | on at least strings and dates, maybe geo areas as well. | simonw wrote: | Go ahead and try it! I've documented all of the tools you | need to run some detailed experiments against SQLite | entirely in your browser here. | ellisv wrote: | It's not even that it always considers all edges, but that you | can't exit the search early when a condition is met. In other | words, you have to first find all the paths and then, outside | the CTE, filter to the shortest. We push the node filter into | the CTE by wrapping it in a function. | | > The solution is to build and index denormalized n-th order | relationship tables. | | This sounds much more performant but also more difficult to | maintain. | switchbak wrote: | "you can't exit the search early when a condition is met" - I | have a DAG traversal written in a recursive CTE, and I can | bail early out just fine when my traversal predicates no | longer match. Not sure why I'd have to do that outside the | (recursive part of the) CTE? | | Obviously maintaining a flattened version is going to perform | better in queries, but you're trading maintenance (write) | time for query time. Or materialization time if you decide to | do it lazily. | ellisv wrote: | If you have an edge list like: | | A->B | | A->D | | B->C | | C->D | | Postgres will walk both paths from A to D in the recursive | CTE and then you can filter them afterwards to keep only | the shortest. | | You can use aggregate functions within the recursive CTE, | so you can't GROUP BY your starting node and stop once you | find the first path. There isn't a way to compare across | multiple paths or iterations of the for-loop. | FleaFlicker99 wrote: | Ok but that's a little different than saying you can't | cut the traversal short. | | If I'm traversing ancestors (let's say) until I find one | that doesn't satisfy a condition, it'll bail out then. I | get that this doesn't serve all uses cases, but it isn't | a small thing either. | mdavidn wrote: | Yes, it is difficult to maintain. That might be a good moment | to consider a proper graph database! | ellisv wrote: | I've considered using a column to store indicate 2nd, 3rd, | etc degree relations instead of n-th order tables. | YouWhy wrote: | While the basic technique is sound, SQL recursive queries' no- | side-effects character means a huge number of basic graph | traversal strategies (e.g.: pruning, shortest path | first/Dijkstra) are beyond reasonable engineering. | | In addition to that, Postgres is difficult as a computational | platform - the execution strategy for recursive queries is not | necessarily intuitive, and their performance is hard to assess | (for example: how soon will you run out of "stack"?) | cpa wrote: | It's a good tool as long as your queries are simple, and your | graph isn't too large (which is probably the case 99% of the | time). And if that allows you to have one less tool to deal with, | go for it! | | But I've tried to push it to its limits (using PostGIS, my nodes | being land plots that were connected to each others -- the | biggest query used joins, lateral, recCTE, windows... you name | it) and ran into many issues: can't branch out early, hard to | optimize, no support for complex queries (writing anything beyond | BFS and DFS is a pita). Yet, in the end I accepted the struggle | (and the long queries) because it was so nice to work with my | data directly where it was kept :) | amyres wrote: | Using the built-in PostGIS topology tools? Or something more | custom? I'm curious as I just started digging into and using | PostGIS for a land parcel use case. I've wondered about the | graph structure of parcels adjacent to others, rather than just | a big unordered table of geom objects. | cpa wrote: | Just using the geography stuff and doing joins on | ST_Intersects. I couldn't guarantee that my data formed a | partition of the plane, which is necessary for topology iirc. | | Fun was had and headaches too. The biggest speedup I got was | computing the graph online (creating a node/vertices tables | from geometries) and then doing the recursive CTE on that | table. | amyres wrote: | Cool thanks! I'm amazed at how much can be done in PostGIS | (if my SQL queries are good enough...) | Rapzid wrote: | I've found this post pretty inspiring for what can be done with | graphs in Postgres: https://www.alibabacloud.com/blog/postgresql- | graph-search-pr... | | There are a LOT of use cases that will fit the capabilities and | performance characteristics. | stevebmark wrote: | Is with_recursive nearly as performant as dedicated graph | database query languages? | vrglvrglvrgl wrote: | [dead] | crunchengine wrote: | I use EdgeDB for graphs, it works well enough, I just don't know | about performances but for sure it is sufficient for most use- | cases. | e1g wrote: | EdgeDB is Postgres so the same constraints apply. | pcthrowaway wrote: | I imagine that's why the person you're responding to | mentioned it. I'm surprised the author of this article didn't | claytongulick wrote: | It doesn't handle all cases or all queries, but one method I've | used to deal with graph data in relational databases it to | materialize paths and cache them on the node (this only really | works well with DAGs). | | So, the path key could look something like: bob.ted.lisa.bill | (obviously, use ids instead of names here) | | Then you can index this in various ways to make LIKE queries | fast. | | Want do know anyone that rolls up to ted? | select * from blah where path like 'bob.ted%' | | Or, how many people report to ted, but aren't under lisa? | select count(*) from blah where path like 'bob.ted%' and path not | like 'bob.ted.lisa%' | | And something a bit more complex that would typically be | considered the domain of a graph database, like "What are the | average number of direct reports each manager in my organization | has?" select name, avg((select count(*) from | blah as sub where path like '%' || name || '%' and | (array_position(string_to_array(path,'.'), name) = | ((array_position(string_to_array(path,'.'), sub.name) - 1)) | | So, even with that degree of complexity, you're still avoiding | recursion, though you might be getting into O(n^2) territory if | you don't have an index that will handle '%' || ... || '%' well. | You can expand on this with various techniques depending on the | application. | | It's certainly no replacement for a graph database, but for basic | stuff it's very fast, handles huge volumes, and doesn't rely on | any sort of recursion at the database layer (unless you | materialize in a trigger or something). | | The actual materialization code can be tricky, one technique to | deal with that is to use the author's technique or something | similar to have a recursive CTE in a view. When a graph change is | made, you can scope the change based on the path's hierarchy and | update the paths from the recursive view, that way you only hit | the performance penalty at write time, not read. | szarnyasg wrote: | I designed and maintain several graph benchmarks in the Linked | Data Benchmark Council, including workloads aimed for databases | [1]. We make no restrictions on implementations, they can any | query language like Cypher, SQL, etc. | | In our last benchmark aimed at analytical systems [2], we found | that SQL queries using WITH RECURSIVE _can_ work for expressing | reachability and even weighted shortest path queries. However, | formulating an efficient algorithm yields very complex SQL | queries [3] and their execution requires a system with a | sophisticated optimizer such as Umbra developed at TU Munich [4]. | Industry SQL systems are not yet at this level but they may | attain that sometime in the future. | | Another direction to include graph queries in SQL is the upcoming | SQL/PGQ (Property Graph Queries) extension. I'm involved in a | project at CWI Amsterdam to incorporate this language into DuckDB | [5]. | | [1] https://ldbcouncil.org/benchmarks/snb/ | | [2] https://www.vldb.org/pvldb/vol16/p877-szarnyas.pdf | | [3] | https://github.com/ldbc/ldbc_snb_bi/blob/main/umbra/queries/... | | [4] https://umbra-db.com/ | | [5] https://www.cidrdb.org/cidr2023/slides/p66-wolde-slides.pdf | eternalban wrote: | Re [5]'s asssertion under "blunders" of the diminish usecases | post sql/pgq, what do you think of [something] like Quine? | | https://github.com/thatdot/quine | | Their claim to fame is progressive incremental computation - | each node is an actor responding to events -- and I'm not sure | how a relational db could do that and match the latencies. That | usecase is pretty much pattern matching and forensics and stuff | like that. | | https://docs.quine.io/core-concepts/architecture.html | hot_gril wrote: | Postgres is powerful. I've implemented graph DB, mapreduce, huge | sparse matrix math (which was cool), and ofc a KV store with it. | With actual performance benefits over the alternatives. It's not | very ergonomic for those, but it works. | | That said, I've had so many jobs where the team decides to fit | our whole data model into a fully recursive graph, and it's been | a huge mistake every time regardless of whether they use Postgres | or a dedicated graph DB. Just because you can do it (you always | can!) doesn't mean you should. Start with just a traditional | n-hierarchy schema and only look at modeling things as a graph if | you're sure it's necessary. Usually it's not. Sometimes you'll | have a mostly non-graph schema with a couple of graph tables for | the things that really need that. | ape4 wrote: | Postgres has commands to easily list conventional tables nicely. | While I can see this would work, there are no native Postgres | commands to list the graph. | jzelinskie wrote: | We used the "WITH RECURSIVE" basically the moment it shipped in | Postgres for the original CoreOS Clair[0] data model which was a | graph of software versions and vulnerabilities. It scaled well | compared to a naive graph abstraction implemented outside the | database, but when performance wasn't great, it REALLY wasn't | great. After running that version in production for a couple | years, we ended up throwing it out and remodeling the domain to | be less graph-y to achieve more consistent performance. | | I've since worked on SpiceDB[1] which scales much better by | taking the traditional design approach for graph databases and | only treating Postgres as triple-store. IME, there's no short- | cut: if you need a graph, you probably want to use a database | optimized for graph access patterns. Most general-purpose graph | databases are full of optimizations for common traversals that | are uncommon operations in relation databases. | | [0]: https://github.com/quay/clair | | [1]: https://github.com/authzed/spicedb | yooloo wrote: | Of course relational db can act like a graph db. It's just not as | efficient due to how things are stored and queried. Would be | great to have a graph db plugin (and I found one | https://github.com/apache/age) | piyh wrote: | Having first class pagerank and shortest path functions in | tinkerpop gremlin vs having to roll it yourself with recursive | CTEs feels like a graph DB win. | sigstoat wrote: | any suggestions for getting nice graph db-style functionality out | of an postgres schema, where you've got the nodes and edges of a | labelled property graph spread across a bunch of different | tables? (AGE seems to want to create a separate graph object that | lives alongside your other tables, and toy examples like this | article just dump everything into special tables) | | in my particular case i don't actually have a lot of data so i'm | not looking for performance, just want to save developer time. | ghotli wrote: | I like this idea and ultimately what I want is "is this a DAG or | not" and if so "give me the topological sort". | | Anyone know of good docs for this sort of pattern with SQL that | addresses those two common tasks? | varunjain99 wrote: | This is a neat data modeling exercise, but not sure why you'd | prefer this approach over Neo4j. | | It's much simpler to query in native graph database land (though | it's not exactly SQL, it's pretty close to be honest). And there | are performance / reliability implications of forcing postgres | into graph land. | piyh wrote: | This isn't just a PG thing. The wiki article isn't great, but the | references have links to sqlite, mssql, oracle, ibm, etc. | | https://en.wikipedia.org/wiki/Hierarchical_and_recursive_que... | simonw wrote: | This same technique - WITH RECURSIVE - is also supported by | SQLite. | | I ported the example in this tutorial to SQLite here - now you | can play with it in Datasette Lite: | https://lite.datasette.io/?sql=https%3A%2F%2Fgist.githubuser... | | Amusingly I ported the sample PostgreSQL code using the new | ChatGPT "Browsing" alpha, with the following prompt: | | > Read this tutorial and then output equivalent create table and | insert statements for SQLite | https://www.dylanpaulus.com/posts/postgres-is-a-graph-databa... | punnerud wrote: | If we have links to other Sqlite3 databases in the SQLite | itself, we could make an static webpage distributed graph | database: https://news.ycombinator.com/item?id=27016630 | | With the static database trick, combined with recursive graphs | mharig wrote: | IIRC, the maybe biggest graph database in the world, TAO (The | Associations and Objects ?) of FB, is based on MySQL. | | Meta may have had a good reason not to choose a 'native' graph | DB. | its-summertime wrote: | Handling cycles seems to be a missing point. | | https://www.postgresql.org/docs/current/queries-with.html#QU... | the docs themselves do have information about it. | ellisv wrote: | Postgres can detect cycles; see 7.8.2.2 in the docs: | https://www.postgresql.org/docs/current/queries-with.html#QU... | jakewins wrote: | Johan wrote the original Neo4j implementation as a graph layer on | top of Postgres, random trivia. Then Java released NIO, he got | excited and tried replacing the PG engine with one built for | graphs from scratch. | kaba0 wrote: | Only remotely relevant, but I have recently come across Gremlin, | a sort of "SQL for graphs". Does anyone have some experience with | it they can share? I only used it on an in-memory graph database, | and I found it quite great so far. | inguz wrote: | Tinkerpop/Gremlin is not really SQL-like at all. My experience | with it has not been fun: it's difficult to reason about, has | obscenely bad documentation, poor tooling, idiosyncratic | library support, and simple data access patterns are hard. | user00012-ab wrote: | whoa.... postgres is moving quickly and taking jobs away from | other databases. Maybe we need to hold up for 6 months on any new | uses for postgres until we all have some time to figure this out. | truth_seeker wrote: | Few suggestions based on my prior experience on the same task in | production system:- | | 1. Both the tables (Nodes and Edges) MUST HAVE one more column as | "label" and then you must create partitions based on distinct | values of "label" column. This greatly helps queries to run | faster as search plan is narrowed when you include it in where | clause and PG knows partitions to hit. | | 2. Instead of one big fat primary key column in each table use, | consider having different uniques indexes on each partition. | | 3. The EDGES table can afford to have "data" column of data type | TEXT or perhaps JSONB. PG saves it in different data disk layout | and compression works great here. We used it to cache the result | for previous/next nodes data or compute the result for many extra | hops which are required on frequent basis. | | 4. Use PG procedures to hide the complexity of reusable DFS/BFS | queries. | yayr wrote: | Can someone explain or point towards an explanation, what the | performance impact of such a query is? How does it scale with | number of edges or depth of recursion? Is there a special smart | way of indexing this to be efficient? | henryfjordan wrote: | If you store a list of edges like (startId, endId) with an | index on startId, then every time you want to follow an edge | will cost O(log(n)) where n is the number of edges in your | graph. | | You can get O(1) performance on similar queries using a | different storage strategy (by storing edges as pointers to the | other nodes in memory) but SQL is not the right tool for that, | you would be better off with a true graph DB. | winrid wrote: | If you have indexes on next_node and previous_node, it'll | preform okay. | | The size of the table (# of edges) won't actually matter *that* | much performance wise with btree indexes. The performance will | mostly depend on how deep you want to go with your recursion. | The more nodes you check per query, the slower it'll be. | Luckily PG will keep the frequently accessed nodes and index | pages in the buffer pool, so it'll perform close enough to an | in-memory graph database for most use cases that access the | same data often - while not requiring your whole dataset to fit | into memory all the time. | | MongoDB's current graph search mechanism works the same way. | | I think actually if you are using incremental ids and you don't | delete edges, you could try BRIN indexes for the next/previous | node columns. This could still get you pretty good performance | with _much_ lower storage space requirements than a | "traditional" index, so you could theoretically store a pretty | large graph (thinking billions of edges) with minimal disk | space. | ForHackernews wrote: | I can't believe this article doesn't mention the (first-party) | ltree extension: https://hoverbear.org/blog/postgresql- | hierarchical-structure... | fbn79 wrote: | Is not nested sets https://en.wikipedia.org/wiki/Nested_set_model | a better solution than the one described in the article? | elchief wrote: | Joe Celko covers graphs in chapters 12-14 of Trees and | Hierarchies in SQL for Smarties | | as well as chapter 27 of SQL for Smarties, 5th | | isn't ISO working on an graph extension to regular SQL? Can't | seem to google it correctly | garyclarke27 wrote: | In my experience Postgres works really with a Recursive CTE | inside a Materialized View for Graphs. Latest version can detect | cycles. I used to add a distance column as well, to show number | of hops apart, any 2 nodes are. Hoping Postgres adds Automatic | Incremental Mat View Maintenance soon, it will be perfect then as | a Graph Database. | TOMDM wrote: | Incremental materialised view maintenance is a killer feature. | | If Postgres added it, almost all of my interest in Materialize | (the database, not the technique) would vanish immediately. | dfragnito wrote: | We accomplish similar with SchemafreeSQL. Currently MySQL 8 back- | end but working on other SQL back-ends. This demo uses | PlanetScale as the DB back-end, Fly.io handles the SFSQL | endpoint, Netlify function handles the query, and cytoscape.js | graph network library for the UI | | https://harmonious-mermaid-c4d794.netlify.app/got (sorry not | mobile friendly but functional) | samlambert wrote: | This is a cool project. | dfragnito wrote: | Thanks !! We had hoped to use PS for our serverless backend | and also offer a dedicated PS option. But we ran into some | issues with Vitess. I was just thinking today I wanted to get | back into it and see if we can resolve these issues. We had | to go with Aurora. | | The issues were on deletes. This demo is read only so it | works well with a PS backend. My experience with your CS has | been amazing despite these issues!! | _boffin_ wrote: | Again, i'll post | https://books.google.com/books/about/Joe_Celko_s_Trees_and_H... | | pdf: | https://github.com/Gumplefik/pdf/blob/master/Joe%20Celko's%2... | packetslave wrote: | we're posting copyrighted PDFs to HN now? | jghn wrote: | Over time I've come to take this a step further. I feel people | reach for a graph database way too early. Most use cases that | most people have will work fine in a standard relational setup. | And that brings with it the benefits of using technologies like | Postgres that have stood the test of time. So I treat it as I do | any performance consideration: first demonstrate that you | actually need to go down the path less traveled, don't just | assume you need to do so. | rajman187 wrote: | Completely agree. In fact, even massive graph data can make use | of a relational database + caching [1] | | [1] https://engineering.fb.com/2013/06/25/core-data/tao-the- | powe... | cryptonector wrote: | The reason graph DBs exist is mainly to provide easier syntax for | recursive queries (which aren't really recursive because they're | tail-recursive, which means iterative rather than what most | people think of as recursive). | | Any questions? | | It's not clear how one might make SQL syntax for recursive | queries simpler. Clearly we can have syntactic sugar for | "dereferencing" relationships, so one could say `SELECT | child-->name FROM parent_child WHERE ...;`, and we could have | aggregation `SELECT array_agg(child-->name) FROM parent_child | WHERE ...;`, and `SELECT child--->name FROM parent_child WHERE | ...;` to say "recurse all the way through the child relation", | but if you want more than one column then things get tricky. | dapearce wrote: | Also see Apache Age, a graph database extension for Postgres: | https://github.com/apache/age | philzook wrote: | Very interesting. Relatedly, I've been refining simple | lightweight ways to use a regular SQL database to execute | seminaive datalog queries. Path reachability in a graph is the | canonical example. One could use stored functions to implement | the outer loop also. | | There are some other fun related applications like graph | rewriting using SQL, etc. | | MiniLitelog: Easy Breezy SQLite Datalog - | https://www.philipzucker.com/tiny-sqlite-datalog/ | ChicagoDave wrote: | As a C# guy, I've figured out how to build in-memory graphs with | generics and dsl/fluid queries. I'm working on a blog entry about | it because it's such a powerful skill to learn and leverage. No | data store required. I can deserialize/serialize the data | structure to/from binary or json. | ralusek wrote: | I've done this on many occasions, but I actually only go with a | single nodes table. | | Schema is like this id type: string (or | enum) data: jsonb indexed: jsonb (but this one | gets a GIN index) toId: nullable foreign key to this same | table fromId: nullable foreign key to this same table | createdAt: date updatedAt: date | | So if I had a post with a comment on it, the data would look | something like this { id: '1111', | type: 'users', data: { name: 'John Doe' }, } | { id: '2222', type: 'users', data: { | name: 'Jane Doe' }, } { id: '3333', | type: 'posts', data: { text: 'Nice day today' }, | } { id: '4444', type: 'comments', | data: { text: 'For you, maybe. Raining here' }, } | | And now for the edges: { id: '5555', | type: '(posts < users)', toId: '3333', | fromId: '1111', } { id: '6666', | type: '(comments < users)', toId: '4444', | fromId: '2222', } { id: '7777', | type: '(posts < comments)', toId: '3333', | fromId: '4444', } | | So then, for example, if I have a post, and I want to find the | comments on it, I search for toId = <my post id>, type = '(posts | < comments)'. | TOMDM wrote: | This works well for nodes with at most 1 parent and child, the | post is aimed at supporting many to many relationships though | ralusek wrote: | What I've described here supports many to many... | fauigerzigerk wrote: | Yes, but the downside of your schema is that the | distinction between nodes and edges is somewhat muddled. | | Presumably, all nodes would have nulls in both fromId and | toId, but the schema doesn't enforce that. The schema also | allows linking edges to other edges. | | Don't you think this is a bit too much flexibility if the | intention is to model a graph? | Dopameaner wrote: | Andy Pavlov, mentioned something similar. Where, he advocated | Relational database to solve most of the graph database | utilities. He also made a bet, if graph databases spawn a legit | usecase by 2030, that he would change his CMU profile picture. | henryfjordan wrote: | I implemented pretty much exactly this setup once. | | For what it's worth, there's a lot of footguns with this | approach. You need to be careful about infinite cycles and things | like that. You also just do not get the ergonomics of using a | query language that is meant for graph data. Once you get it all | setup and experience the issues you will no doubt be able to | build whatever you want but there's a learning curve. | | In the end we switched to using Neo4j and were able to build new | features a lot more quickly than we would've been able to on | Postgres. | | It's also worth mentioning that there are many ways to store | graph data, and using an "adjacency list" like is being done here | may or may not be the best way for your use case. Neo4j I think | uses a different approach where nodes store info about edges they | are connected to directly, so you don't even need to hit an index | to follow an edge. | mistermann wrote: | Piggybacking on your Neo4J reference....can anyone suggest any | resources that would be good for someone from the SQL world who | is very uninformed on graph databases to quickly get their head | around the key ideas, but even more importantly, how to choose | a platform among the offerings (licensing being I think a key | issue, I've heard Neo4J can get expensive)...assume a large, | "social media" scale system, to be safe. | viksit wrote: | you can use datomic as a good example. | https://hashrocket.com/blog/posts/using-datomic-as-a- | graph-d... | | another good example using predicates - | https://github.com/threatgrid/asami/wiki/2.-Introduction | pottertheotter wrote: | I was looking at graph databases/Neo4j a lot early last year | and found Neo4j has a lot of good resources. | | You can get this book for free through their website. [1] | | This have some good documentation[2], including this page | "Transition from relational to graph database".[3] | | [1] https://neo4j.com/graph-databases-book/ | | [2] https://neo4j.com/docs/getting-started/current/ | | [3] https://neo4j.com/docs/getting- | started/current/appendix/grap... | henryfjordan wrote: | Neo4j has a self-hosted community edition that is free, | though you need to update to Enterprise to expand your DB | beyond one node. It's worth noting though that scaling from a | single node to a cluster is going to have some pretty huge | performance issues whether you are using SQL or Neo4j to | store your graph. The power of graph databases is that it | should be very inexpensive to follow an edge in the graph but | when that leads to a network hop you lose a lot of | performance. | | If you are comfortable fitting your data onto one box, then | development speed is probably more important than other | factors and I would just try a few databases out and | especially pay attention to how good the libraries are in | your programming language of choice. Neo4j for instance had | high quality libraries in Java but the experience in Python | was not as good. | | If you have a lot of data or need next-level performance, I | would start by doing some research on the various ways to | store graph data and then pick a DB that supports the | approach you want to take. | joshspankit wrote: | As someone who hated being in the SQL world and couldn't | figure out why until years later when I found out about graph | databases, here's one big shift: | | Graph database store true relationships. | | The R in RDBMS and the concept of relationships by primary | keys are lies (imo). They lead people away from what true | relationships can be. Databases like Neo4j are not about | doing any sort of PK or join or merge-before-query. Looking | up by connection is the _fastest_ way to find information. If | your RDBMS had one table for phone numbers, one for email | addresses, one for first names, one for last names, one for | street, and so-on it would take huge amounts of time to query | to find all the detail for a single person. With a graph | database it takes a small amount of time to find the first | bit of info (let's say phone number since it's easily | sorted), and then every other bit of linked data is | essentially free. | eurasiantiger wrote: | Effectively, moving from RDB to a graph database just | shifts relationship resolution from query time to | insert/update time. Sometimes this can be an advantage. I'd | even wager usually. | taubek wrote: | For some basics of graph modeling take a look at | https://memgraph.com/docs/memgraph/tutorials/graph-modeling. | | Some differences between SQL and graph databases are covered | at https://memgraph.com/blog/graph-database-vs-relational- | datab.... | cryptonector wrote: | > You need to be careful about infinite cycles and things like | that. | | Not really, just never forget this: always use `UNION`, _and | never_ `UNION ALL`, in recursive queries! | revskill wrote: | PostgreSQL is my love. Thanks for all developers. | | Do not blame the tool if you feel it doesn't fit your niche. | Blame yourself first, and try to use it the right way. | amir734jj wrote: | Good example. Now do it with undirected graph. How will the query | look like? | | So, is this basically running transitive closure on the database | during query? it would be expensive. | pphysch wrote: | Small nitpick but I wish the author just made `data` a JSONB | field rather than VARCHAR. That really shows the power of | Postgres, being able to operate as a "NoSQL graph DB" with little | to no hacking. | ganderzz wrote: | Hey, author here! Thanks for the feedback! I went back and | forth with having `data` being a JSONB field or VARCHAR, but | ended up with VARCHAR to show that `nodes` don't have to be | anything crazy. Really `nodes` is just a standard table you'd | see in any old SQL database, and what makes it a "graph" is the | `edges`/relationship table. | bottlepalm wrote: | I can see how this possibly won't scale compared to dedicated | graph dbs. | | I'm interested in what the fundamental difference is in those | graph dbs because rdbms seem so close I don't see why it isn't | just provided as an extra layer on top. | | As described in the article, any graph can already be built with | an rbms, so why for example does neo4j need to be standalone and | can't possibly be a layer on top of postgres to leverage existing | relationships? Any good articles on that? | emodendroket wrote: | I don't know off the top of my head but I'd assume that one | could adjust the storage strategy and other performance | tweaking to optimize for the expected use case. | samwillis wrote: | If you are looking to utilise this with Django, there is the | brilliant Django CTE project that enables it and other CTE | queries with the Django ORM: | | https://dimagi.github.io/django-cte/#recursive-common-table-... | taubek wrote: | I personally prefer native graph databases such as Memgraph[1]. | Graph databases have it's advantages and disadvantages. Sometimes | you, need RDBMS, sometimes you need NoSQL solution. So I pick my | stack based on situation and issue that I need to solve. | | [1] https://memgraph.com | rajman187 wrote: | "Native" is a misleading or misused term in the context of | graph databases because a graph cannot be mapped in a maximally | consistent way with the underlying hardware (the von Neumann | architecture represents data in a sequential manner and it is | much faster to access it this way rather than randomly). | | There are generally two families in the graph database world: | those which use underlying traditional tables of nodes and | many-to-many edges; and index-free adjacency which just means | each node in the graph knows the memory address of its | connections (other side of the edges). | | Distributed graphs necessarily end up using the former because | it's difficult if not impossible for a node to know the memory | address of its connection when that crosses a physical | boundary. So typically index-free adjacency graphs have a | master-slave setup with multiple read replicas but a single one | to write to. | | So with a "native graph" you don't rely on potentially | expensive join operations to find neighbors of neighbors and | can traverse complex paths easily. | dtomicevic wrote: | This is spot on. Though, in Memgraph's context, there is a | combination of lock free techniques and raw pointer | representation that can work exceptionally well for a large | variety of use cases, and even better so compared to others | on streaming / event driven / large volume of writes. Using | raw pointers, there are also optimization opportunities when | storing or rebalancing the graph to maximize the number of | adjacent nodes in a cache line so benefiting from sequential | representation too. | mashroom wrote: | Thanks for pointing me toward Memgraph. I didn't hear of it | before and it appears to be a great alternative to neo4j. | nilsbunger wrote: | Sure, at its most basic a graph database is just a bunch of | many2many relationships. | | The recursive query is cool, but how is the performance for graph | queries compared to a DB optimized for this use case ? | ellisv wrote: | I haven't compared, but something like Kuzu is probably much | more performant for a lot of use cases because it implements | Worst-Case Optimal Joins. | | But I think Postgres is likely to get these improvements in a | few years too. ___________________________________________________________________ (page generated 2023-03-31 23:00 UTC)