[HN Gopher] Semantic Diff for SQL ___________________________________________________________________ Semantic Diff for SQL Author : s0ck_r4w Score : 154 points Date : 2022-07-29 15:14 UTC (7 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | tessierashpool wrote: | this is very cool, but I believe this bit of the README is | incorrect: | | _Text-based diff tools such as git diff, when applied to a code | base, have certain limitations. First, they can only detect | insertions and deletions, not movements or updates of individual | pieces of code._ | | git diff can detect movements. looking at my .gitconfig, I think | it's the "frag = magenta" line. | Something1234 wrote: | Post it or link it so we can better understand! | chrisjc wrote: | Very nice/interesting. | | Somewhat related question and apologies if this is already stated | in the documentation (it's rather dense and I haven't had the | time to read through it completely)... | | Can you use sqlglot to create custom DDL dialects that have | custom first class objects? For instance, if I want to build a | custom SQL/DDL/DML dialect that had a new kind of object such as | a "pipe", "kitchensink", etc, would sqlglot be a good tool to | use? | | I've tried playing around with Apache Calcite, but it lost me | pretty quickly since the examples to customize/extend DDLs were | quite lacking in my opinion. | captaintobs wrote: | Yes. SQLGlot is very customizable and you can pretty much | override everything. It kind of needs to be flexible because | even common sql dialects vary greatly. | | Here's an example of how we use SQLGlot to output raw Python | code directly from SQL. | | https://github.com/tobymao/sqlglot/blob/main/sqlglot/executo... | chrisjc wrote: | Very nice, and exactly along the lines of what I was | thinking... I want to be able to create a custom SQL dialect | that can output some code. | | Thanks very much, looking forward to spending some time | reading through all of this! | AeroNotix wrote: | What about difftastic? | s0ck_r4w wrote: | Difftastic seems like a really cool tool. There are a few | reasons, however, why it doesn't apply well to use cases I had | in mind: | | 1. It's in JS and not Python. These days a common data | (including data tooling) stack revolves around Python and | fitting JS into this ecosystem is not straightforwad. | | 2. Limited dialect support. As far as I can see it only | supports "PostgreSQL flavor" (not sure what exactly is meant by | "flavor" here). Support for dialects like Spark, Trino, Hive, | etc SQL was crucial. | | Definitely a worthy mention, though, thank you! | leeoniya wrote: | > It's in JS and not Python | | it's in Rust: | | https://github.com/Wilfred/difftastic | | what am i missing? | rmccue wrote: | Difftastic's wiki also has a breakdown of some structured | diff algorithms: | https://github.com/Wilfred/difftastic/wiki/Structural-Diffs | | (I've been working on a similar problem, effectively diffing | an XML tree.) | difflens wrote: | Interesting, will give sqlglot a look when we get to adding SQL | support in DiffLens [https://github.com/marketplace/difflens]. Or | perhaps DiffLens can just use sqlglot :) Either way we're very | happy to see another semantic diff tool. | | P.S: We work on DiffLens. It currently supports TS, JS, CSS and | text diffs. We're working on making a VS Code extension currently | stochtastic wrote: | I've been very impressed with sqlglot, and am looking forward to | trying this feature. The only issue I've had with sqlglot is | transpiling for use with a specific spark version: in my | experience Spark is not great about surfacing obvious 'not | registered' errors when a function isn't supported (especially in | >=2.4). I ran into this with width_bucket, which is only in the | most recent release. I am curious whether there's a | straightforward way to write with a specific release and catch | the error in transpilation rather than execution. | | Side note: Iaroslav (post author) and Toby (sqlglot creator) are | both amazing, and I'm so glad that they're working on open source | projects like this. | captaintobs wrote: | It's easy to add errors for dialects by calling #unsupported | when a function is used. | | In terms of versioning of engines, I haven't implemented that | yet, but presumably it could be done by adding a dialect | subclass and having versioning route to it, so we could do | something like parse(sql, dialect="spark", version=...) which | could then route to a 2.3 version of spark. | | Happy to chat more about this and we can see about adding it | (or feel free to make a pr). You can DM me on twitter or some | other avenue as well if you want to dive in deep. | nerpderp82 wrote: | https://github.com/tobymao/sqlglot | karmakaze wrote: | I thought this was going to be something else like being able to | tell that a rewritten query returns the same set of rows, but | with potentially a very different query plan. E.g. dependent | EXISTS subquery vs IN subquery. | captaintobs wrote: | This is what the sqlglot optimizer is used for. The optimizer | converts EXISTS and IN into a canonicalized SQL (some variant | of left join) which can then be compared to another query. | | So if you run the optimizer first and then the diff tool, it | could solve this kind of use case. | gavinray wrote: | This exists for Postgres, let me try to find the name of the | tool | | EDIT: I can't find it, I searched for 30 mins, I promise this | exists though. If anyone else can remember the name of it, | please post. | [deleted] ___________________________________________________________________ (page generated 2022-07-29 23:00 UTC)