[HN Gopher] An Introduction to Knowledge Graphs ___________________________________________________________________ An Introduction to Knowledge Graphs Author : umangkeshri Score : 97 points Date : 2021-05-22 12:02 UTC (10 hours ago) (HTM) web link (ai.stanford.edu) (TXT) w3m dump (ai.stanford.edu) | julienreszka wrote: | I wish all of you not to fall in the trap of ontologies. I worked | very hard in this domain my conclusion is that all ontologies | fail to scale eventually. I would recommend people in the field | to go towards "perspectivism". | omginternets wrote: | Could you (please, pretty please) elaborate? | hnxs wrote: | I'd like to read more about what you worked on specifically if | you're willing to share! | quag wrote: | Is this[1] an example of ontological perspectivism? Can you | point us at a good place to start? | | [1]: | https://link.springer.com/article/10.1007/s11406-021-00371-1 | mistrial9 wrote: | great start - I have presented this point of view myself.. no | clue what "perspectivism" means really, though | Veuxdo wrote: | Is this just a way of saying that no relations are absolute? | bluecerulean wrote: | He most likely means that reasoners and databases that | provide reasoning abilities do not scale. This makes sense, | specially for OWL ontologies. For most OWL reasoners, if you | feed them with the ontology and with a large set of instance | data (class instances connected by edges that are labeled | with properties defined in said ontology), it will likely | take way more time than you would like to produce results (if | it produces something). | | The reason for that is twofold: | | 1. Many of tools created for reasoning are research-first | tools. Some papers were published about the tool and it | really was a petter and more scalable tool than anything | before it. But every PhD student graduates and needs to find | a job or move to the next hyped research area 2. Tools are | designed under the assumption that the whole ontology, all | the instance data and all results fit in main memory (RAM). | This assumption is de-facto necessary for more powerful | entailment regimes of OWL. | | Reason 2 as a secondary sub-reason that OWL ontologies use | URIs (actually IRIs), which are really inneficient | identifiers compared to 32/64-bit integers. HDT is a format | that fixes this inneficiency for RDF (and thus is applicable | to ontologies) but since it came about nearly all reasoners | where already abandoned as per reason #1 above. | | Newer reasoners that actually scale quite a bit are RDFox [1] | and VLog [2]. They use compact representations and try to be | nice with the CPU cache and pipeline. However, they are | limited to a single shared memory (even if NUMA). | | There is a lot of mostly academic distributed reasoners | designed to scale horizontally instead of vertically. These | systems technically scale, but vertically scaling the | centralized aforementioned systems will be more efficient. | The intrinsic problem with distributing is that (i) it is | hard to partition the input aiming at a fair distribution of | work and (ii) inferred facts derived at one node often are | evidence that multiple other nodes need to known. | | loose from modern single-node However, the problem of | computing all inferred edges from a knowledge graph involves | a great deal of communication, since one inference found by | one node is evidence required by another processing node. | | [1]: https://www.oxfordsemantic.tech/product [2]: | https://github.com/karmaresearch/vlog/ | JoelJacobson wrote: | SQL might be a good fit to model Knowledge Graphs, since FOREIGN | KEYs can be named, using the CONSTRAINT constraint_name FOREIGN | KEY ... syntax. We thus have support to label edges. | | Nodes = Tables | | Edges = Foreign keys | | Edge labels = Foreign key constraint names | FigmentEngine wrote: | yes, you can always map most structures into tables, or even | excel. | | but it think "good fit" is a stretch. when designing systems | you generally want to look at data access patterns, and pick a | data exec approach that aligns to that. | | in tech, unfortunately, RDBMS are the "hammer" in "if your only | tool is a hammer then every problem looks like a nail." | lmeyerov wrote: | This kind of approach is pretty common, including in compute | engines like Spark's graphx. I suspect a lot of teams using | graph DBs would be better off realizing this: it's good for | simple and small problems | | it does fall down for graphy tasks like multihop joins, connect | the dots, and supernodes. So for GB/TBs of that, either you | should do those outside the DB, or with an optimized DB. | Likewise, not explicitly discussed in the article, modern | knowledge graphs are often really about embedding vectors, not | entity UUIDs, and few/no databases straddle relational queries, | graph queries, and vector queries | zozbot234 wrote: | > it does fall down for graphy tasks like multihop joins, | connect the dots, and supernodes. | | These can always be accomplished via recursive SQL queries. | Of course any given implementation might be unoptimized for | such tasks. But in practice, this kind of network analytics | tends to be quite rare anyway. | | One should note that even inference tasks, that are often | thought of as exclusive to the "semantic" or "knowledge" | based paradigm, can be expressed very simply via SQL VIEW's. | Of course this kind of inference often turns out to be | infeasible in practice, or to introduce unwanted noise in the | 'inferred' data, but this has nothing to do with SQL per se | and is just as true of the "knowledge base" or "semantic" | approach. | er4hn wrote: | Graph databases likely are more optimized for this sort of data | storage, but you've hit it on the head that SQL databases can | be used to represent node/edge style data. | zozbot234 wrote: | The definition seems faulty to me, since the pair (E: subset(N x | N), f: E - L) does not admit of multiple edges with different | labels, connecting the same ordered pair of nodes. Of course this | is most often allowed in practical KG's. | mmarx wrote: | Indeed multiple edges (with different labels) are quite useful, | particularly when you want to represent RDF graphs. But since | there is no restriction on the form of L, you can still | represent those by, e.g., letting L be a set of sets of IRIs, | and thus labelling your edges with sets of IRIs, which you then | interpret as a set of RDF triples (i.e., as a set of edges). | low_tech_love wrote: | On a side note, I love the idea of researchers writing "articles" | in this format. No paywall, no complex two-column format, no | PDFs. As a researcher myself, I wish this is what my | "productivity" was judged upon, I'd probably have a lot more fun | and motivation to work and produce! | wrnr wrote: | KG are cool, but I haven't find a practical framework of | combining simple logical predicates with temporal facts (things | that are true at a certain moment in time) and information | provenance (the truthiness of information given the origin). | There might be ways to encode this information in a hyper graph | but they are far from practical. | physicsyogi wrote: | Checkout Datomic. It's a temporal database that uses datalog as | it's query language. There's also Datascript, which does the | same thing. | superlopuh wrote: | Unfortunate name for a product, I can't find anything called | Dynamic on DDG, only dynamic things with a lowercase d. Do | you have a link to the project? | bosie wrote: | not dynamic but Datomic | superlopuh wrote: | Dyslexic moment on my part, thank you | omginternets wrote: | Lysdexia makes fools of us all ;) | mmarx wrote: | Wikidata statements (which roughly correspond to the edges in | the Knowledge Graph) have quite a bit of Metadata associated | with them: they can have refer to sources that state this | particular bit of knowledge, they have a so-called rank that | allows distinguishing preferred and deprecated statements, and | the can be qualified by another statement in the graph. | Temporal validity is encoded using a combination of rank and | qualifiers, as for, e.g., Pluto[0], where the instance-of | statement saying that "Pluto is a planet" is deprecated and has | an "end time" qualifier, and the preferred statement says | "Pluto is a dwarf planet," with a corresponding "start time" | qualifier. | | In principle, all of this information is available through the | SPARQL endpoint or as an RDF export (there is also the | simplified export that contains only "simple" statements | lacking all of that metadata), so reasoning over this data is | not entirely out of reach, but the sheer size (the full RDF | dump is a few hundred GBs) is also not particularly practical | to deal with. | | [0] https://www.wikidata.org/wiki/Q339#P31 | gballan wrote: | Might be worth looking at Sowa's Conceptual Graphs. E.g., [1], | talks about time, and links to his book. | | [1] http://www.jfsowa.com/ontology/process.htm | hocuspocus wrote: | Check out Nexus, which was designed with versioning in mind, | that solves this kind of challenge at the Blue Brain Project: | | https://bluebrainnexus.io/ | physicsgraph wrote: | Knowledge graphs for text (the focus of the article) seem | narrowly-scoped since they require "objective" facts and | relations to be practical. Capturing the subjective and transient | perspective of observations made by multiple observers (which is | what we actually have access to) is more complicated. | | For example, asking the same person the same question may yield | different answers based on their mood or other environmental or | situational factors. Who's asking the question can also matter, | as does the specific phrasing of the question. ___________________________________________________________________ (page generated 2021-05-22 23:00 UTC)