[HN Gopher] Relational Deep Learning: Graph representation learn... ___________________________________________________________________ Relational Deep Learning: Graph representation learning on relational databases [pdf] Author : taubek Score : 112 points Date : 2023-11-28 16:16 UTC (6 hours ago) (HTM) web link (relbench.stanford.edu) (TXT) w3m dump (relbench.stanford.edu) | cs702 wrote: | Just added this to my reading list. It looks and smells like a | promising approach for training and inference at scale _directly | from relational data_ , and _modeling the relations_ , instead of | from flat records/files with features and labels. It's | _refreshing_ to see original work that is not about incrementally | tweaking some transformer! | _boffin_ wrote: | looking forward to reading this later today. | mjhay wrote: | I'm always amazed by how little attention tabular/relational data | gets in DL research, despite being far more common in | applications than images or unstructured text. | importantbrian wrote: | I actually do see DL research on tabular problems come out, but | it seems like they generally aren't better than xgboost or | similar boosted tree models on the same dataset, and when they | are better it's only marginal and not worth the | interpretability tradeoff in the real world. Boosted trees are | already really good for most tabular tasks, so it makes sense | that most of the focus would be on images and natural language | where they are a big improvement over traditional methods. | | Additionally, it seems to me that most of the money spent on | research in the DL space goes to places like OpenAI whose main | goal is to develop AGI. They see large transformer models as | the path to AGI, and once you've got AGI solved the AGI can | solve all the other problems. | bluecoconut wrote: | I started Approximate Labs to tackle this problem! We just | recently addressed one of the biggest gaps we believe exists in | the DL research community for this modality, dataset; we openly | released the largest dataset of tables with annotations last | month https://www.approximatelabs.com/blog/tablib | | Feel free to reach out to me / join our discord if you want to | talk about tabular data / relational data and its relation to | AI~ | arolihas wrote: | There is little incentive to publish since it rarely does | better than XGBoost. | cmrdporcupine wrote: | This is interesting, and smart, using the knowledge present in | the normalized relationships / data connectivity as part of the | knowledge for training. | | A properly designed relational database is a kind of | propositional knowledge base, with each tuple ("row") being a | "fact"; it makes sense to mine this as part of learning. It's how | a human would "read" this data, so makes sense. | | Nitpicking: | | _" The core idea is to view relational tables as a heterogeneous | graph, with a node for each row in each table, and edges | specified by primary-foreign key relations."_ | | The word "relation" is used incorrectly here -- the _" relation"_ | is the table, the name for the foreign-key relationship is... | relationship. The key distinction being that relations (big bags | of facts) exist independent of those pre-declared relationships; | the relationships can be restructured in any-which-way through | queries, with foreign key constraints just being a convention for | doing that. That's the key benefit of the relational data model | over hierarchical/tree/network/graph databases where the | relationships are hardcoded, and the key thing that Codd was | getting at in his original paper: relational databases are the | theoretically most flexible model of data (apart from just bags | of text) because the relationships are derived, not hardcoded. | | So what they're describing here is each tuple in the relation as | a graph node, edges defined by the attributes of the relation; | which is precisely how one does the inverse (describe graphs as | relations) See e.g. RelationalAI's product | (https://docs.relational.ai/rel/concepts/graph-normal-form), or | how graphs are modeled in e.g. Datalog or Prolog. | larodi wrote: | Let me help you - SQL and the whole predicate logic is prolog | from start to end and also grammar and also regex and also | recursion and also generative. | | The thing is to make something actually useful out of it. | Stochastic guys figured how to encode information with their | stochastic approach. The relational algebra guys did something | with discreet relations, we've been only scratching grammars | even though LZW, sequitur and alike are grammar tasks. | | Markov chains included, we've done very little in this regard. | Much to expound on here. Patinjali was onit 3k years ago. | abeppu wrote: | So ... how does one evaluate the quality of a framework + | benchmark without results like this? I feel like this would be a | lot more compelling if they had at least some reasonable first | models with result metrics and analysis showing that the tasks | are both tractable and are good at revealing differences in | performance and behavior (i.e. the problems should be neither too | easy or to hard). For example, what if Amazon purchase/churn | behavior is just largely independent from review behavior, and | review behavior is mostly dictated by "did some company use this | account to generate fake 5-star reviews?" | larodi wrote: | It's about choosing what to embed. We're still way to go with | embedding non textual concepts. Results in smaller models, but | even more difficult to reason about ___________________________________________________________________ (page generated 2023-11-28 23:00 UTC)