[HN Gopher] Relational Deep Learning: Graph representation learn...
       ___________________________________________________________________
        
       Relational Deep Learning: Graph representation learning on
       relational databases [pdf]
        
       Author : taubek
       Score  : 112 points
       Date   : 2023-11-28 16:16 UTC (6 hours ago)
        
 (HTM) web link (relbench.stanford.edu)
 (TXT) w3m dump (relbench.stanford.edu)
        
       | cs702 wrote:
       | Just added this to my reading list. It looks and smells like a
       | promising approach for training and inference at scale _directly
       | from relational data_ , and _modeling the relations_ , instead of
       | from flat records/files with features and labels. It's
       | _refreshing_ to see original work that is not about incrementally
       | tweaking some transformer!
        
       | _boffin_ wrote:
       | looking forward to reading this later today.
        
       | mjhay wrote:
       | I'm always amazed by how little attention tabular/relational data
       | gets in DL research, despite being far more common in
       | applications than images or unstructured text.
        
         | importantbrian wrote:
         | I actually do see DL research on tabular problems come out, but
         | it seems like they generally aren't better than xgboost or
         | similar boosted tree models on the same dataset, and when they
         | are better it's only marginal and not worth the
         | interpretability tradeoff in the real world. Boosted trees are
         | already really good for most tabular tasks, so it makes sense
         | that most of the focus would be on images and natural language
         | where they are a big improvement over traditional methods.
         | 
         | Additionally, it seems to me that most of the money spent on
         | research in the DL space goes to places like OpenAI whose main
         | goal is to develop AGI. They see large transformer models as
         | the path to AGI, and once you've got AGI solved the AGI can
         | solve all the other problems.
        
         | bluecoconut wrote:
         | I started Approximate Labs to tackle this problem! We just
         | recently addressed one of the biggest gaps we believe exists in
         | the DL research community for this modality, dataset; we openly
         | released the largest dataset of tables with annotations last
         | month https://www.approximatelabs.com/blog/tablib
         | 
         | Feel free to reach out to me / join our discord if you want to
         | talk about tabular data / relational data and its relation to
         | AI~
        
         | arolihas wrote:
         | There is little incentive to publish since it rarely does
         | better than XGBoost.
        
       | cmrdporcupine wrote:
       | This is interesting, and smart, using the knowledge present in
       | the normalized relationships / data connectivity as part of the
       | knowledge for training.
       | 
       | A properly designed relational database is a kind of
       | propositional knowledge base, with each tuple ("row") being a
       | "fact"; it makes sense to mine this as part of learning. It's how
       | a human would "read" this data, so makes sense.
       | 
       | Nitpicking:
       | 
       |  _" The core idea is to view relational tables as a heterogeneous
       | graph, with a node for each row in each table, and edges
       | specified by primary-foreign key relations."_
       | 
       | The word "relation" is used incorrectly here -- the _" relation"_
       | is the table, the name for the foreign-key relationship is...
       | relationship. The key distinction being that relations (big bags
       | of facts) exist independent of those pre-declared relationships;
       | the relationships can be restructured in any-which-way through
       | queries, with foreign key constraints just being a convention for
       | doing that. That's the key benefit of the relational data model
       | over hierarchical/tree/network/graph databases where the
       | relationships are hardcoded, and the key thing that Codd was
       | getting at in his original paper: relational databases are the
       | theoretically most flexible model of data (apart from just bags
       | of text) because the relationships are derived, not hardcoded.
       | 
       | So what they're describing here is each tuple in the relation as
       | a graph node, edges defined by the attributes of the relation;
       | which is precisely how one does the inverse (describe graphs as
       | relations) See e.g. RelationalAI's product
       | (https://docs.relational.ai/rel/concepts/graph-normal-form), or
       | how graphs are modeled in e.g. Datalog or Prolog.
        
         | larodi wrote:
         | Let me help you - SQL and the whole predicate logic is prolog
         | from start to end and also grammar and also regex and also
         | recursion and also generative.
         | 
         | The thing is to make something actually useful out of it.
         | Stochastic guys figured how to encode information with their
         | stochastic approach. The relational algebra guys did something
         | with discreet relations, we've been only scratching grammars
         | even though LZW, sequitur and alike are grammar tasks.
         | 
         | Markov chains included, we've done very little in this regard.
         | Much to expound on here. Patinjali was onit 3k years ago.
        
       | abeppu wrote:
       | So ... how does one evaluate the quality of a framework +
       | benchmark without results like this? I feel like this would be a
       | lot more compelling if they had at least some reasonable first
       | models with result metrics and analysis showing that the tasks
       | are both tractable and are good at revealing differences in
       | performance and behavior (i.e. the problems should be neither too
       | easy or to hard). For example, what if Amazon purchase/churn
       | behavior is just largely independent from review behavior, and
       | review behavior is mostly dictated by "did some company use this
       | account to generate fake 5-star reviews?"
        
       | larodi wrote:
       | It's about choosing what to embed. We're still way to go with
       | embedding non textual concepts. Results in smaller models, but
       | even more difficult to reason about
        
       ___________________________________________________________________
       (page generated 2023-11-28 23:00 UTC)