[HN Gopher] Show HN: Simple-graph - a graph database in SQLite ___________________________________________________________________ Show HN: Simple-graph - a graph database in SQLite Author : dpapathanasiou Score : 128 points Date : 2020-12-26 16:31 UTC (6 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | kevas wrote: | Why not add that functionality directly to SQLite via stored | procs* | | https://www.amazon.com/Hierarchies-Smarties-Kaufmann-Managem... | | *https://github.com/wolfch/sqlite-3.7.3.p1 | [deleted] | pstuart wrote: | A more recent effort: https://cgsql.dev/ | kevas wrote: | Thanks | lolive wrote: | I wonder if there are ways, in SQLite, to build indices for | s,p,o/s,p/p,o/ and maybe more subtle ones... That would be uber | nice, given the fact that most graph databases have their own | indexing strategies, and you cannot craft your own. | JimmyRuska wrote: | I saw this lecture some time back on the topic of | implementation and tradeoffs | https://www.youtube.com/watch?v=Dxwo9DYWV_c | bjornsing wrote: | How does this perform compared to a "native" graph database like | Neo4J? | lsb wrote: | Neo4j has failed queries I have written, with "out of memory" | errors. I have never, ever, ever gotten that from SQLite. | lolive wrote: | Performance issues are a very valid discussion. But to me, | the availability of a graph-oriented query language on top of | this graph variant of SQLite is, imho, the very first step to | investigate. (RDF import/CSV import being next) | szarnyasg wrote: | There has been a lot of progress on creating standardized | query languages for graphs. The two most notable ones are | [2]: | | - SQL/PGQ, a property graph query extension to SQL is | planned to be released next year as part of SQL:2021. | | - GQL, a standalone graph query language will follow later. | | While it is a lot of work to design these languages, both | graph database vendors (e.g. Neo4j, TigerGraph) and | traditional RDBMS companies (e.g. Oracle [2], | PostgreSQL/2ndQuadrant [3]) seem serious about them. And | with a well-defined query language, it should be possible | to build a SQL/PGQ engine in (or on top of) SQLite as well. | | [1] https://www.linkedin.com/pulse/sql-now-gql-alastair- | green/ | | [2] http://wiki.ldbcouncil.org/pages/viewpage.action?pageId | =1062... | | [3] https://www.linkedin.com/pulse/postgresql-oracle-graph- | query... | linkdd wrote: | Isn't it like asking "how does sqlite perform compared to | databases like PostgreSQL" ? | | SQLite is used a lot on edge (mobile apps, ...), sounds like | this project provide a graph database for the very same use | case (I probably won't run Neo4J on mobile). | afavour wrote: | IMO it's a different question. SQLite and Postgres are both | relational databases, it stands to reason that they're at | least doing things in _similar_ ways. They're two | implementations of the same idea(ish). A graph database is | something else altogether. Grafting that capability onto a | relational database has the potential to perform horribly. | | It's a bad analogy, but SQLite to Postgres is like AMD vs | Intel x86 CPUs, whereas a graph database is ARM. Can it be | emulated? Yes. Is there a far greater potential for slowdown? | Yes. | szundi wrote: | Then maybe no such questions are necessary. | joelschw wrote: | I think the big difference here is that when comparing SQLite | you are at least using the same query language. | | In the graph space you have Gremlin, Cryper, GQL and many | other proprietary query engines (which also looks to be the | the case here). | | Without that accessibility this feels a bit like pickling a | NetworkX object. | tasogare wrote: | It depends on how the graph is stored in the database. In this | project the nodes ids are TEXT so it will likely not scale very | well. I know because I use a similar implementation with GUID | as string in Sqlite in a project since a couple of years and | while it works fine for the graph I have (<1 million nodes, few | edges per nodes) it won't perform too well past that. | mosselman wrote: | Thanks for the info. Do you happen to have some stats to | share? | batmansmk wrote: | It really depends on what you want to do with it. | | I would benchmark the tasks "traversal", "aggregation" and | "shortest past" for a 10k to 10M node graph. Anything under 10k | would be good enough with most techs and over 10M need to | consider more tasks (writes, backup, the precise fields queried | can become their particular problems at larger scale). | | The Github link implements "traversal "in Python instead of | pure SQLite. I suspect it will be around x10 slower than it | could be with the same tech stack, because it queries once per | node from Python to SQLite. Shortest path is not implemented | and would be too slow to be useful in an interactive | environment. "Aggregation" is also not implemented, but it | would perform admirably, because SQL is good at that. | | Traditional relational OLTP databases such as Postgres are | already faster than dedicated graph databases for certain graph | related tasks, according to this benchmark: | https://www.arangodb.com/2018/02/nosql-performance-benchmark... | szarnyasg wrote: | > Traditional relational OLTP databases such as Postgres are | already faster than dedicated graph databases for certain | graph related tasks | | It is indeed quite common that relational databases | outperform graph databases on certain graph processing | problems such as subgraph queries (a.k.a. graph pattern | matching). There are two key reasons for this: (1) most graph | pattern matching operations can be formulated using | relational operations such as natural joins, antijoins, and | outer joins; and (2) relational databases have been around | longer and have well-optimized operators. | | A lot of the value that graph databases provide lies in their | query languages which (for most systems) allow formulating | path queries using a nice syntax (unlike SQL's WITH RECURSIVE | which many people find difficult to read and write). Their | property graph data model supports a schema-optional | approach, which makes them better suited for storing semi- | structured data. They also "provide efficient programmatic | access to the graph, allowing one to write arbitrary | algorithms against them if needed" [1]. | | With all these said, graph databases could be much faster on | subgraph queries than relational databases and there are | recent research results on the topic (worst-case optimal | joins, A+ indexes, etc.). But these are not available in any | production system yet. | | [1] http://wp.sigmod.org/?p=1497 | chrisweekly wrote: | > "shortest past" | | shortest path typo, right? | _jal wrote: | The Open Shortest Past First protocol is used to resolve | temporal paradoxes. | brian_herman wrote: | Awesome! And you can write SQL queries on the data amazing SQLite | is the best database. | rekwah wrote: | Tangentially related to graph dbs, but if you're looking for more | hierarchical support, SQLite does has a transitive closure | extension[0] that might be of some assistance. I leveraged this | back in 2014 to write our framework-agnostic result storage on | AWS Device Farm. | | [0] - | https://www.sqlite.org/cgi/src/artifact/636024302cde41b2bf0c... | | [1] - https://charlesleifer.com/blog/querying-tree-structures- | in-s... | abetlen wrote: | I've actually been working on an extension to perform breadth | first search queries in SQLite on general graphs [0]. The | extension is actually based off of the transitive closure | extension. You can use it on any existing SQLite database as | long as you can wrangle your edges into a single table (real or | virtual) and the node ids are integers (I'm planning on | removing this constraint in the future). | | [0]: https://github.com/abetlen/sqlite3-bfsvtab-ext | rklaehn wrote: | Interesting. SQLite is awesome. | | I did something similar recently, a block store for a rust | implementation of ipfs, which models a directed acyclic graph of | content-addressed nodes. | | https://github.com/actyx/ipfs-sqlite-block-store | | I found that performance is pretty decent if you do almost | everything inside SQLite using WITH RECURSIVE. | | The documentation has some really great examples for WITH | RECURSIVE. https://sqlite.org/lang_with.html | abetlen wrote: | The issue I found with WITH RECURSIVE queries is that they're | incredibly inefficient for anything but trees. I've looked | around and there doesn't seem to be any way to store a global | list of visited nodes. This means that when performing a | traversal of the graph the recursive query will follow __all | __paths between two nodes. ___________________________________________________________________ (page generated 2020-12-26 23:00 UTC)