[HN Gopher] Semantic Diff for SQL
       ___________________________________________________________________
        
       Semantic Diff for SQL
        
       Author : s0ck_r4w
       Score  : 154 points
       Date   : 2022-07-29 15:14 UTC (7 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | tessierashpool wrote:
       | this is very cool, but I believe this bit of the README is
       | incorrect:
       | 
       |  _Text-based diff tools such as git diff, when applied to a code
       | base, have certain limitations. First, they can only detect
       | insertions and deletions, not movements or updates of individual
       | pieces of code._
       | 
       | git diff can detect movements. looking at my .gitconfig, I think
       | it's the "frag = magenta" line.
        
         | Something1234 wrote:
         | Post it or link it so we can better understand!
        
       | chrisjc wrote:
       | Very nice/interesting.
       | 
       | Somewhat related question and apologies if this is already stated
       | in the documentation (it's rather dense and I haven't had the
       | time to read through it completely)...
       | 
       | Can you use sqlglot to create custom DDL dialects that have
       | custom first class objects? For instance, if I want to build a
       | custom SQL/DDL/DML dialect that had a new kind of object such as
       | a "pipe", "kitchensink", etc, would sqlglot be a good tool to
       | use?
       | 
       | I've tried playing around with Apache Calcite, but it lost me
       | pretty quickly since the examples to customize/extend DDLs were
       | quite lacking in my opinion.
        
         | captaintobs wrote:
         | Yes. SQLGlot is very customizable and you can pretty much
         | override everything. It kind of needs to be flexible because
         | even common sql dialects vary greatly.
         | 
         | Here's an example of how we use SQLGlot to output raw Python
         | code directly from SQL.
         | 
         | https://github.com/tobymao/sqlglot/blob/main/sqlglot/executo...
        
           | chrisjc wrote:
           | Very nice, and exactly along the lines of what I was
           | thinking... I want to be able to create a custom SQL dialect
           | that can output some code.
           | 
           | Thanks very much, looking forward to spending some time
           | reading through all of this!
        
       | AeroNotix wrote:
       | What about difftastic?
        
         | s0ck_r4w wrote:
         | Difftastic seems like a really cool tool. There are a few
         | reasons, however, why it doesn't apply well to use cases I had
         | in mind:
         | 
         | 1. It's in JS and not Python. These days a common data
         | (including data tooling) stack revolves around Python and
         | fitting JS into this ecosystem is not straightforwad.
         | 
         | 2. Limited dialect support. As far as I can see it only
         | supports "PostgreSQL flavor" (not sure what exactly is meant by
         | "flavor" here). Support for dialects like Spark, Trino, Hive,
         | etc SQL was crucial.
         | 
         | Definitely a worthy mention, though, thank you!
        
           | leeoniya wrote:
           | > It's in JS and not Python
           | 
           | it's in Rust:
           | 
           | https://github.com/Wilfred/difftastic
           | 
           | what am i missing?
        
           | rmccue wrote:
           | Difftastic's wiki also has a breakdown of some structured
           | diff algorithms:
           | https://github.com/Wilfred/difftastic/wiki/Structural-Diffs
           | 
           | (I've been working on a similar problem, effectively diffing
           | an XML tree.)
        
       | difflens wrote:
       | Interesting, will give sqlglot a look when we get to adding SQL
       | support in DiffLens [https://github.com/marketplace/difflens]. Or
       | perhaps DiffLens can just use sqlglot :) Either way we're very
       | happy to see another semantic diff tool.
       | 
       | P.S: We work on DiffLens. It currently supports TS, JS, CSS and
       | text diffs. We're working on making a VS Code extension currently
        
       | stochtastic wrote:
       | I've been very impressed with sqlglot, and am looking forward to
       | trying this feature. The only issue I've had with sqlglot is
       | transpiling for use with a specific spark version: in my
       | experience Spark is not great about surfacing obvious 'not
       | registered' errors when a function isn't supported (especially in
       | >=2.4). I ran into this with width_bucket, which is only in the
       | most recent release. I am curious whether there's a
       | straightforward way to write with a specific release and catch
       | the error in transpilation rather than execution.
       | 
       | Side note: Iaroslav (post author) and Toby (sqlglot creator) are
       | both amazing, and I'm so glad that they're working on open source
       | projects like this.
        
         | captaintobs wrote:
         | It's easy to add errors for dialects by calling #unsupported
         | when a function is used.
         | 
         | In terms of versioning of engines, I haven't implemented that
         | yet, but presumably it could be done by adding a dialect
         | subclass and having versioning route to it, so we could do
         | something like parse(sql, dialect="spark", version=...) which
         | could then route to a 2.3 version of spark.
         | 
         | Happy to chat more about this and we can see about adding it
         | (or feel free to make a pr). You can DM me on twitter or some
         | other avenue as well if you want to dive in deep.
        
         | nerpderp82 wrote:
         | https://github.com/tobymao/sqlglot
        
       | karmakaze wrote:
       | I thought this was going to be something else like being able to
       | tell that a rewritten query returns the same set of rows, but
       | with potentially a very different query plan. E.g. dependent
       | EXISTS subquery vs IN subquery.
        
         | captaintobs wrote:
         | This is what the sqlglot optimizer is used for. The optimizer
         | converts EXISTS and IN into a canonicalized SQL (some variant
         | of left join) which can then be compared to another query.
         | 
         | So if you run the optimizer first and then the diff tool, it
         | could solve this kind of use case.
        
         | gavinray wrote:
         | This exists for Postgres, let me try to find the name of the
         | tool
         | 
         | EDIT: I can't find it, I searched for 30 mins, I promise this
         | exists though. If anyone else can remember the name of it,
         | please post.
        
         | [deleted]
        
       ___________________________________________________________________
       (page generated 2022-07-29 23:00 UTC)