[HN Gopher] An Introduction to Knowledge Graphs
       ___________________________________________________________________
        
       An Introduction to Knowledge Graphs
        
       Author : umangkeshri
       Score  : 97 points
       Date   : 2021-05-22 12:02 UTC (10 hours ago)
        
 (HTM) web link (ai.stanford.edu)
 (TXT) w3m dump (ai.stanford.edu)
        
       | julienreszka wrote:
       | I wish all of you not to fall in the trap of ontologies. I worked
       | very hard in this domain my conclusion is that all ontologies
       | fail to scale eventually. I would recommend people in the field
       | to go towards "perspectivism".
        
         | omginternets wrote:
         | Could you (please, pretty please) elaborate?
        
         | hnxs wrote:
         | I'd like to read more about what you worked on specifically if
         | you're willing to share!
        
         | quag wrote:
         | Is this[1] an example of ontological perspectivism? Can you
         | point us at a good place to start?
         | 
         | [1]:
         | https://link.springer.com/article/10.1007/s11406-021-00371-1
        
           | mistrial9 wrote:
           | great start - I have presented this point of view myself.. no
           | clue what "perspectivism" means really, though
        
         | Veuxdo wrote:
         | Is this just a way of saying that no relations are absolute?
        
           | bluecerulean wrote:
           | He most likely means that reasoners and databases that
           | provide reasoning abilities do not scale. This makes sense,
           | specially for OWL ontologies. For most OWL reasoners, if you
           | feed them with the ontology and with a large set of instance
           | data (class instances connected by edges that are labeled
           | with properties defined in said ontology), it will likely
           | take way more time than you would like to produce results (if
           | it produces something).
           | 
           | The reason for that is twofold:
           | 
           | 1. Many of tools created for reasoning are research-first
           | tools. Some papers were published about the tool and it
           | really was a petter and more scalable tool than anything
           | before it. But every PhD student graduates and needs to find
           | a job or move to the next hyped research area 2. Tools are
           | designed under the assumption that the whole ontology, all
           | the instance data and all results fit in main memory (RAM).
           | This assumption is de-facto necessary for more powerful
           | entailment regimes of OWL.
           | 
           | Reason 2 as a secondary sub-reason that OWL ontologies use
           | URIs (actually IRIs), which are really inneficient
           | identifiers compared to 32/64-bit integers. HDT is a format
           | that fixes this inneficiency for RDF (and thus is applicable
           | to ontologies) but since it came about nearly all reasoners
           | where already abandoned as per reason #1 above.
           | 
           | Newer reasoners that actually scale quite a bit are RDFox [1]
           | and VLog [2]. They use compact representations and try to be
           | nice with the CPU cache and pipeline. However, they are
           | limited to a single shared memory (even if NUMA).
           | 
           | There is a lot of mostly academic distributed reasoners
           | designed to scale horizontally instead of vertically. These
           | systems technically scale, but vertically scaling the
           | centralized aforementioned systems will be more efficient.
           | The intrinsic problem with distributing is that (i) it is
           | hard to partition the input aiming at a fair distribution of
           | work and (ii) inferred facts derived at one node often are
           | evidence that multiple other nodes need to known.
           | 
           | loose from modern single-node However, the problem of
           | computing all inferred edges from a knowledge graph involves
           | a great deal of communication, since one inference found by
           | one node is evidence required by another processing node.
           | 
           | [1]: https://www.oxfordsemantic.tech/product [2]:
           | https://github.com/karmaresearch/vlog/
        
       | JoelJacobson wrote:
       | SQL might be a good fit to model Knowledge Graphs, since FOREIGN
       | KEYs can be named, using the CONSTRAINT constraint_name FOREIGN
       | KEY ... syntax. We thus have support to label edges.
       | 
       | Nodes = Tables
       | 
       | Edges = Foreign keys
       | 
       | Edge labels = Foreign key constraint names
        
         | FigmentEngine wrote:
         | yes, you can always map most structures into tables, or even
         | excel.
         | 
         | but it think "good fit" is a stretch. when designing systems
         | you generally want to look at data access patterns, and pick a
         | data exec approach that aligns to that.
         | 
         | in tech, unfortunately, RDBMS are the "hammer" in "if your only
         | tool is a hammer then every problem looks like a nail."
        
         | lmeyerov wrote:
         | This kind of approach is pretty common, including in compute
         | engines like Spark's graphx. I suspect a lot of teams using
         | graph DBs would be better off realizing this: it's good for
         | simple and small problems
         | 
         | it does fall down for graphy tasks like multihop joins, connect
         | the dots, and supernodes. So for GB/TBs of that, either you
         | should do those outside the DB, or with an optimized DB.
         | Likewise, not explicitly discussed in the article, modern
         | knowledge graphs are often really about embedding vectors, not
         | entity UUIDs, and few/no databases straddle relational queries,
         | graph queries, and vector queries
        
           | zozbot234 wrote:
           | > it does fall down for graphy tasks like multihop joins,
           | connect the dots, and supernodes.
           | 
           | These can always be accomplished via recursive SQL queries.
           | Of course any given implementation might be unoptimized for
           | such tasks. But in practice, this kind of network analytics
           | tends to be quite rare anyway.
           | 
           | One should note that even inference tasks, that are often
           | thought of as exclusive to the "semantic" or "knowledge"
           | based paradigm, can be expressed very simply via SQL VIEW's.
           | Of course this kind of inference often turns out to be
           | infeasible in practice, or to introduce unwanted noise in the
           | 'inferred' data, but this has nothing to do with SQL per se
           | and is just as true of the "knowledge base" or "semantic"
           | approach.
        
         | er4hn wrote:
         | Graph databases likely are more optimized for this sort of data
         | storage, but you've hit it on the head that SQL databases can
         | be used to represent node/edge style data.
        
       | zozbot234 wrote:
       | The definition seems faulty to me, since the pair (E: subset(N x
       | N), f: E - L) does not admit of multiple edges with different
       | labels, connecting the same ordered pair of nodes. Of course this
       | is most often allowed in practical KG's.
        
         | mmarx wrote:
         | Indeed multiple edges (with different labels) are quite useful,
         | particularly when you want to represent RDF graphs. But since
         | there is no restriction on the form of L, you can still
         | represent those by, e.g., letting L be a set of sets of IRIs,
         | and thus labelling your edges with sets of IRIs, which you then
         | interpret as a set of RDF triples (i.e., as a set of edges).
        
       | low_tech_love wrote:
       | On a side note, I love the idea of researchers writing "articles"
       | in this format. No paywall, no complex two-column format, no
       | PDFs. As a researcher myself, I wish this is what my
       | "productivity" was judged upon, I'd probably have a lot more fun
       | and motivation to work and produce!
        
       | wrnr wrote:
       | KG are cool, but I haven't find a practical framework of
       | combining simple logical predicates with temporal facts (things
       | that are true at a certain moment in time) and information
       | provenance (the truthiness of information given the origin).
       | There might be ways to encode this information in a hyper graph
       | but they are far from practical.
        
         | physicsyogi wrote:
         | Checkout Datomic. It's a temporal database that uses datalog as
         | it's query language. There's also Datascript, which does the
         | same thing.
        
           | superlopuh wrote:
           | Unfortunate name for a product, I can't find anything called
           | Dynamic on DDG, only dynamic things with a lowercase d. Do
           | you have a link to the project?
        
             | bosie wrote:
             | not dynamic but Datomic
        
               | superlopuh wrote:
               | Dyslexic moment on my part, thank you
        
               | omginternets wrote:
               | Lysdexia makes fools of us all ;)
        
         | mmarx wrote:
         | Wikidata statements (which roughly correspond to the edges in
         | the Knowledge Graph) have quite a bit of Metadata associated
         | with them: they can have refer to sources that state this
         | particular bit of knowledge, they have a so-called rank that
         | allows distinguishing preferred and deprecated statements, and
         | the can be qualified by another statement in the graph.
         | Temporal validity is encoded using a combination of rank and
         | qualifiers, as for, e.g., Pluto[0], where the instance-of
         | statement saying that "Pluto is a planet" is deprecated and has
         | an "end time" qualifier, and the preferred statement says
         | "Pluto is a dwarf planet," with a corresponding "start time"
         | qualifier.
         | 
         | In principle, all of this information is available through the
         | SPARQL endpoint or as an RDF export (there is also the
         | simplified export that contains only "simple" statements
         | lacking all of that metadata), so reasoning over this data is
         | not entirely out of reach, but the sheer size (the full RDF
         | dump is a few hundred GBs) is also not particularly practical
         | to deal with.
         | 
         | [0] https://www.wikidata.org/wiki/Q339#P31
        
         | gballan wrote:
         | Might be worth looking at Sowa's Conceptual Graphs. E.g., [1],
         | talks about time, and links to his book.
         | 
         | [1] http://www.jfsowa.com/ontology/process.htm
        
         | hocuspocus wrote:
         | Check out Nexus, which was designed with versioning in mind,
         | that solves this kind of challenge at the Blue Brain Project:
         | 
         | https://bluebrainnexus.io/
        
       | physicsgraph wrote:
       | Knowledge graphs for text (the focus of the article) seem
       | narrowly-scoped since they require "objective" facts and
       | relations to be practical. Capturing the subjective and transient
       | perspective of observations made by multiple observers (which is
       | what we actually have access to) is more complicated.
       | 
       | For example, asking the same person the same question may yield
       | different answers based on their mood or other environmental or
       | situational factors. Who's asking the question can also matter,
       | as does the specific phrasing of the question.
        
       ___________________________________________________________________
       (page generated 2021-05-22 23:00 UTC)