[HN Gopher] Show HN: Simple-graph - a graph database in SQLite
       ___________________________________________________________________
        
       Show HN: Simple-graph - a graph database in SQLite
        
       Author : dpapathanasiou
       Score  : 128 points
       Date   : 2020-12-26 16:31 UTC (6 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | kevas wrote:
       | Why not add that functionality directly to SQLite via stored
       | procs*
       | 
       | https://www.amazon.com/Hierarchies-Smarties-Kaufmann-Managem...
       | 
       | *https://github.com/wolfch/sqlite-3.7.3.p1
        
         | [deleted]
        
         | pstuart wrote:
         | A more recent effort: https://cgsql.dev/
        
           | kevas wrote:
           | Thanks
        
       | lolive wrote:
       | I wonder if there are ways, in SQLite, to build indices for
       | s,p,o/s,p/p,o/ and maybe more subtle ones... That would be uber
       | nice, given the fact that most graph databases have their own
       | indexing strategies, and you cannot craft your own.
        
         | JimmyRuska wrote:
         | I saw this lecture some time back on the topic of
         | implementation and tradeoffs
         | https://www.youtube.com/watch?v=Dxwo9DYWV_c
        
       | bjornsing wrote:
       | How does this perform compared to a "native" graph database like
       | Neo4J?
        
         | lsb wrote:
         | Neo4j has failed queries I have written, with "out of memory"
         | errors. I have never, ever, ever gotten that from SQLite.
        
           | lolive wrote:
           | Performance issues are a very valid discussion. But to me,
           | the availability of a graph-oriented query language on top of
           | this graph variant of SQLite is, imho, the very first step to
           | investigate. (RDF import/CSV import being next)
        
             | szarnyasg wrote:
             | There has been a lot of progress on creating standardized
             | query languages for graphs. The two most notable ones are
             | [2]:
             | 
             | - SQL/PGQ, a property graph query extension to SQL is
             | planned to be released next year as part of SQL:2021.
             | 
             | - GQL, a standalone graph query language will follow later.
             | 
             | While it is a lot of work to design these languages, both
             | graph database vendors (e.g. Neo4j, TigerGraph) and
             | traditional RDBMS companies (e.g. Oracle [2],
             | PostgreSQL/2ndQuadrant [3]) seem serious about them. And
             | with a well-defined query language, it should be possible
             | to build a SQL/PGQ engine in (or on top of) SQLite as well.
             | 
             | [1] https://www.linkedin.com/pulse/sql-now-gql-alastair-
             | green/
             | 
             | [2] http://wiki.ldbcouncil.org/pages/viewpage.action?pageId
             | =1062...
             | 
             | [3] https://www.linkedin.com/pulse/postgresql-oracle-graph-
             | query...
        
         | linkdd wrote:
         | Isn't it like asking "how does sqlite perform compared to
         | databases like PostgreSQL" ?
         | 
         | SQLite is used a lot on edge (mobile apps, ...), sounds like
         | this project provide a graph database for the very same use
         | case (I probably won't run Neo4J on mobile).
        
           | afavour wrote:
           | IMO it's a different question. SQLite and Postgres are both
           | relational databases, it stands to reason that they're at
           | least doing things in _similar_ ways. They're two
           | implementations of the same idea(ish). A graph database is
           | something else altogether. Grafting that capability onto a
           | relational database has the potential to perform horribly.
           | 
           | It's a bad analogy, but SQLite to Postgres is like AMD vs
           | Intel x86 CPUs, whereas a graph database is ARM. Can it be
           | emulated? Yes. Is there a far greater potential for slowdown?
           | Yes.
        
           | szundi wrote:
           | Then maybe no such questions are necessary.
        
           | joelschw wrote:
           | I think the big difference here is that when comparing SQLite
           | you are at least using the same query language.
           | 
           | In the graph space you have Gremlin, Cryper, GQL and many
           | other proprietary query engines (which also looks to be the
           | the case here).
           | 
           | Without that accessibility this feels a bit like pickling a
           | NetworkX object.
        
         | tasogare wrote:
         | It depends on how the graph is stored in the database. In this
         | project the nodes ids are TEXT so it will likely not scale very
         | well. I know because I use a similar implementation with GUID
         | as string in Sqlite in a project since a couple of years and
         | while it works fine for the graph I have (<1 million nodes, few
         | edges per nodes) it won't perform too well past that.
        
           | mosselman wrote:
           | Thanks for the info. Do you happen to have some stats to
           | share?
        
         | batmansmk wrote:
         | It really depends on what you want to do with it.
         | 
         | I would benchmark the tasks "traversal", "aggregation" and
         | "shortest past" for a 10k to 10M node graph. Anything under 10k
         | would be good enough with most techs and over 10M need to
         | consider more tasks (writes, backup, the precise fields queried
         | can become their particular problems at larger scale).
         | 
         | The Github link implements "traversal "in Python instead of
         | pure SQLite. I suspect it will be around x10 slower than it
         | could be with the same tech stack, because it queries once per
         | node from Python to SQLite. Shortest path is not implemented
         | and would be too slow to be useful in an interactive
         | environment. "Aggregation" is also not implemented, but it
         | would perform admirably, because SQL is good at that.
         | 
         | Traditional relational OLTP databases such as Postgres are
         | already faster than dedicated graph databases for certain graph
         | related tasks, according to this benchmark:
         | https://www.arangodb.com/2018/02/nosql-performance-benchmark...
        
           | szarnyasg wrote:
           | > Traditional relational OLTP databases such as Postgres are
           | already faster than dedicated graph databases for certain
           | graph related tasks
           | 
           | It is indeed quite common that relational databases
           | outperform graph databases on certain graph processing
           | problems such as subgraph queries (a.k.a. graph pattern
           | matching). There are two key reasons for this: (1) most graph
           | pattern matching operations can be formulated using
           | relational operations such as natural joins, antijoins, and
           | outer joins; and (2) relational databases have been around
           | longer and have well-optimized operators.
           | 
           | A lot of the value that graph databases provide lies in their
           | query languages which (for most systems) allow formulating
           | path queries using a nice syntax (unlike SQL's WITH RECURSIVE
           | which many people find difficult to read and write). Their
           | property graph data model supports a schema-optional
           | approach, which makes them better suited for storing semi-
           | structured data. They also "provide efficient programmatic
           | access to the graph, allowing one to write arbitrary
           | algorithms against them if needed" [1].
           | 
           | With all these said, graph databases could be much faster on
           | subgraph queries than relational databases and there are
           | recent research results on the topic (worst-case optimal
           | joins, A+ indexes, etc.). But these are not available in any
           | production system yet.
           | 
           | [1] http://wp.sigmod.org/?p=1497
        
           | chrisweekly wrote:
           | > "shortest past"
           | 
           | shortest path typo, right?
        
             | _jal wrote:
             | The Open Shortest Past First protocol is used to resolve
             | temporal paradoxes.
        
       | brian_herman wrote:
       | Awesome! And you can write SQL queries on the data amazing SQLite
       | is the best database.
        
       | rekwah wrote:
       | Tangentially related to graph dbs, but if you're looking for more
       | hierarchical support, SQLite does has a transitive closure
       | extension[0] that might be of some assistance. I leveraged this
       | back in 2014 to write our framework-agnostic result storage on
       | AWS Device Farm.
       | 
       | [0] -
       | https://www.sqlite.org/cgi/src/artifact/636024302cde41b2bf0c...
       | 
       | [1] - https://charlesleifer.com/blog/querying-tree-structures-
       | in-s...
        
         | abetlen wrote:
         | I've actually been working on an extension to perform breadth
         | first search queries in SQLite on general graphs [0]. The
         | extension is actually based off of the transitive closure
         | extension. You can use it on any existing SQLite database as
         | long as you can wrangle your edges into a single table (real or
         | virtual) and the node ids are integers (I'm planning on
         | removing this constraint in the future).
         | 
         | [0]: https://github.com/abetlen/sqlite3-bfsvtab-ext
        
       | rklaehn wrote:
       | Interesting. SQLite is awesome.
       | 
       | I did something similar recently, a block store for a rust
       | implementation of ipfs, which models a directed acyclic graph of
       | content-addressed nodes.
       | 
       | https://github.com/actyx/ipfs-sqlite-block-store
       | 
       | I found that performance is pretty decent if you do almost
       | everything inside SQLite using WITH RECURSIVE.
       | 
       | The documentation has some really great examples for WITH
       | RECURSIVE. https://sqlite.org/lang_with.html
        
         | abetlen wrote:
         | The issue I found with WITH RECURSIVE queries is that they're
         | incredibly inefficient for anything but trees. I've looked
         | around and there doesn't seem to be any way to store a global
         | list of visited nodes. This means that when performing a
         | traversal of the graph the recursive query will follow __all
         | __paths between two nodes.
        
       ___________________________________________________________________
       (page generated 2020-12-26 23:00 UTC)