[HN Gopher] Difftastic: A diff that understands syntax ___________________________________________________________________ Difftastic: A diff that understands syntax Author : tempodox Score : 741 points Date : 2022-03-29 11:38 UTC (11 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | jedisct1 wrote: | No support for Zig :( | Wilfred wrote: | Difftastic has support for ~20 languages, and I'm happy to add | more if there's a decent tree-sitter parser available :) | yewenjie wrote: | Does a `magit` plugin exist for Emacs users? The author of this | package is also the author of a couple of popular Emacs packages | but I did not see any mention of Emacs. | Myrmornis wrote: | It won't be able to form the basis of a magit plugin because it | does not target traditional diff format. | sytelus wrote: | Is there a VSCode extension for this? | neves wrote: | Now I want a 3 way merge version :-) | einpoklum wrote: | The documentation says: | | > Difftastic output is intended for human consumption | | Why not separate the human-consumption part and the underlying | parsing part? Or at least provide both in the same utility? | Wilfred wrote: | The underlying parser is just tree-sitter, which is a reusable | (and excellent) parsing library. | | Difftastic then converts the tree-sitter parse tree to a | simpler s-expression style format (see | https://difftastic.wilfred.me.uk/parsing.html#simplified- | syn...), and computes differences on that. | | I'm just trying to clarify that I'm not generating conventional | 'unified diff' patches, so I can provide a nicer interface | (e.g. line numbers). | synergy20 wrote: | I use meld and it seems syntax aware plus it can do merge with a | click, how will difftastic diff in that regard? | berkes wrote: | I use meld too. But afaics, meld 'syntax aware' is very | different from from difftastic. | | Meld takes a diff, and applies syntax highlighting over the | diffed files. It additionally highlights the changed characters | in a line. Git diff, vimdiff and probably others, do this as | well. | | From the demo, I understand that Difftastic first applies | syntax and then rebuilds the patch over that. Being aware of | line wrapping, changes in nesting, moving codeblocks into | functions and so on. | challenger-derp wrote: | First thing that came to mind is diffing python notebooks. | gh02t wrote: | Don't think this tool supports that, but there is | https://nbdime.readthedocs.io/en/latest/ | cycomanic wrote: | For Jupyter Notebooks I highly recommend trying out jupytext, | which converts Notebooks on the fly to a number of formats. It | really has been a game changer for working with git and | Notebooks for me. I essentially never want to preserve state of | the notebooks anyway so converting just makes sense. The best | thing is it is completely transparent, i.e. it generates a | notebook file when you open the other file and saves to the | file ever time the notebook is saved. If you want to keep the | state of the notebook you can always keep that file around as | well. | dmarinus wrote: | Looks nice! Now I only need patchtasic :-) | dotancohen wrote: | Actually, the README addresses that! > Non- | goals > Patching. Difftastic output is intended for human | consumption, and it does > not generate patches that you | can apply later. Use diff if you need a patch. | taspeotis wrote: | I paid and used SemanticMerge quite successfully when we had a | complex Git workflow with lots of conflicts. | | https://semanticmerge.com/ | | Since moving to short lived feature branches it is less useful to | me. | ziml77 wrote: | I don't need SemanticMerge often, but when I do I'm incredibly | thankful that I have it. | Liquid_Fire wrote: | SemanticMerge sounded interesting enough so I wanted to check | it out, but to my surprise there is no Buy or Download link | anywhere on the site. The only thing that might do it is a | Login link, but I don't want to create an account just to see | how much the thing costs. Is it only sold in bulk to companies? | I find it bizarre that there isn't even a "contact sales" | button. | ziml77 wrote: | That's incredibly annoying! They must have changed something | about their pricing and sales model since the time that I had | purchased it. I don't understand why companies think that's a | good idea. I guess I can't recommend it anymore. | Aeolun wrote: | There is a 'sales' button at the bottom, but it's just a link | to an email. I'm really not sure how they're even trying to | sell this thing. | | Maybe they don't want to any more? And this is just their | subtle way of pushing everyone interested in using it away? | bifftastic wrote: | I like the name. | rednosehacker wrote: | Any plan for the Scheme programming language ? | Wilfred wrote: | I'd like to add it, but I haven't found any good tree-sitter | parsers for Scheme. | emacsen wrote: | This looks absolutely amazing. | | One thing I do find interesting (and a wish were different) is | that only programming languages are supported, rather than data | formats as well. | | For example, two JSON documents may be valid but formatted | slightly differently, or a common task for me is comparing two | YAML files. | | Comparing config files that have a well defined syntax and or can | be abstracted into a tree (JSON, YAML, TOML, etc.) would be | absolutely lovely, even and including (if possible) Markdown and | its ilk. | simonw wrote: | I would naively expect that this problem is easiest to solve | for languages like JSON that have an unambiguous way to be | pretty printed. | chockchocschoir wrote: | Indeed. One could just do `diff $(jq . $fileOne) $(jq . | $fileTwo)` and you'll end up with a "nice enough" diff even | if $fileOne and $fileTwo were very differently formatted. | lstamour wrote: | The problem is when a file also needs to be normalized - | e.g. object keys in a different order, YAML syntax | expansion. It can be very useful to indicate when a JSON | file is identical to another JSON file but some of the | properties or array items are out of order and that | requires more in-depth knowledge of the data format. Let's | not mention that you could UTF-8 encode characters or write | out the same character using backslash notation, numeric or | boolean data that might be wrapped in a string in one file | but not in another, etc. There can still be a lot of | modelling and interpretation to consider when comparing | data files rather than code files. | chockchocschoir wrote: | I'm not too familiar with YAML, so can't answer to that. | | But re JSON: | | > object keys in a different order | | They can't be "in a different order" as JSON keys are not | ordered. They can be whatever order, and would still be | considered the same. | | > array items are out of order | | Then it's different, as JSON arrays are ordered. ["a", | "b"] is not the same as ["b", "a"] while {a: 1, b: 1} and | {b: 1, a: 1} is the same. | | > you could UTF-8 encode characters or write out the same | character using backslash notation, numeric or boolean | data that might be wrapped in a string in one file but | not in another | | Then again, they are different. If the data inside is | different, it's different. | | I understand that logically, they are the same, but not | syntax-wise, which is why I included the "differently | formatted" "disclaimer", it wouldn't obviously understand | that "one" and "1" is the same, but then again, should | you? Depends on use case I'd say, hard to generalize. | [deleted] | stormbrew wrote: | > They can't be "in a different order" as JSON keys are | not ordered. They can be whatever order, and would still | be considered the same. | | This is what GP is saying, I'm pretty sure. Object member | order is non-semantic in json, so in order to do a | semantic diff (one that understands structure), you need | to canonicalize the order of the two sides. Simply | diffing the output of jq doesn't do that, because (afaik) | jq doesn't alter the order. | | Basically, if you want this to come up the same: | {"a":"b","c":"d"} {"c":"d","a":"b"} | | you need more than just `diff $(jq) $(jq)`. | | Can argue about whether a tool like difftastic should do | that, I guess, but I would personally lean towards that | it should be smart enough to see this because it's | precisely the sort of thing that both humans and line- | based diff can be awful at seeing. | fwip wrote: | Just an FYI, jq has a flag to sort by the name of keys, I | believe it's -k. | stormbrew wrote: | Fair enough! I should just never assume jq doesn't have a | feature. | autarch wrote: | I wrote a tool that tidies JSON and can do things like | re-orders keys in a fixed order - | https://github.com/ActiveState/json-ordered-tidy | Wilfred wrote: | https://github.com/andreyvit/json-diff works really well for | JSON diffing in my experience. | | It's more simplistic than difftastic though: it considers `1` | and `[1]` to have nothing in common. | [deleted] | paxys wrote: | This isn't going to add anything to existing diff tools for | JSON or YAML though. Those formats barely have any syntax | highlighting or complex structures. | Wilfred wrote: | JSON and CSS are supported today, and I'm interested in adding | more structured text formats. | | If a format has a tree-sitter parser, it can be added to | difftastic. The TOML tree-sitter parser looks good, but there | isn't a mature markdown parser for tree-sitter. There are other | markdown parsers available, so in principle difftastic could | support markdown that way. | | The display logic might need a little tuning for prose-heavy | formats like markdown though. I'm not happy with how difftastic | handles block comments yet either. | | I'm not sure about formats that contain more prose, such as | markdown or HTML. | mark_and_sweep wrote: | JSON is supported. | | HTML and XML are missing, too. | emacsen wrote: | You're right. I missed JSON. | | Sadly YAML, TOML and the others I mentioned are not there | (yet?) | softwarebeware wrote: | There's always room for contributions! | alxmrs wrote: | Similarly, I would love it if Pandoc's AST were supported. Or, | if this could be extended to compare any documents taking | formatting into account, or document-to-document conversions. | linsomniac wrote: | I would love a great XML diff tool, and after seeing the demo | of this I was sad to see XML not in there. Would pay for. | d0gsg0w00f wrote: | This is kind of like the problem of programmatically analyzing | AWS IAM roles and policies to understand impact of changes. | Very difficult to do in JSON format but worth tons of money to | CISOs if it can be solved. | LudwigNagasena wrote: | Is there a good reason why diff tools generally don't use AST? | skywal_l wrote: | Performances is one I guess. | db48x wrote: | Also there are a lot of languages out there, each with their | own special and unique syntaxes. | danbruc wrote: | Because it is much easier, you don't have to build and maintain | parsers for hundreds of languages. And you don't need need just | any parser, you need very robust ones that can deal with | malformed files well. Or, if you only pick a small set of | supported languages, your diff tool will not work on most files | or have to fall back to a structure-agnostic algorithm. Also | not all text files even follow any useful grammar at all. | | Finally, even if you have a syntax tree, that is just part of | the solution, probably the smaller one. Detecting three lines | of code wrapped in a new if statement is easy but also doesn't | benefit much from a syntax-aware algorithm. But once you | changes names and signatures, extract methods, introduce | constants, and so on it will become progressively harder to | match subtrees and one is probably quickly approaching the | territory of NP-hard and undecidable problems. | RyEgswuCsn wrote: | > And you don't need need just any parser, you need very | robust ones that can deal with malformed files well. | | I very much agree. I feel there has been a trend recently | where people (re)discovered how cool and useful ASTs are and | now expect everything be using them. I suspect old-school | computer scientists might be secretly laughing at this while | programming with some Lisp-like languages they invented for | themselves. | | Jokes aside, I do wonder how modern IDEs manage to parse | broken source code into usable ASTs --- is this trivial (CS | theory-wise) or are there a lot of engineering secret sauce | involved to make it work? | danbruc wrote: | With only basic knowledge in the domain I would assume it | is hard and ugly. If the file is malformed, there is almost | certainly an infinite number of possible edits to make the | file adhere to the grammar, hence there can not be any | algorithm that just provides the one and only correct | syntax tree. This in turn means that you have to come up | with heuristics that identify reasonable changes which fix | the file and that is probably not easy. Also, if you do | this online in an IDE, the problem becomes probably easier | [1] - if you have a valid file and then make it invalid by | deleting an operator in the middle of some expression, you | can still essentially use the syntax tree from just before | the deletion. If, on the other hand, you get a malformed | file, you might have a harder time. | | [1] And also harder because if you want to parse the file | after each key stroke, you have to be fast. This probably | also makes incremental updates to the syntax tree the | preferred solution and that might align well with using | prior result for error recovery. | jhgb wrote: | "If the file is malformed, there is almost certainly an | infinite number of possible edits to make the file adhere | to the grammar, hence there can not be any algorithm that | just provides the one and only correct syntax tree. This | in turn means that you have to come up with heuristics | that identify reasonable changes which fix the file and | that is probably not easy." | | Don't we call such heuristics "test suites"? | danbruc wrote: | I don't understand that question. Given the following | source file that does not parse var foo = | bar baz | | there are many ways to change it and make it parse | including the following reasonable ones | var foo = barbaz var foo = "bar baz" var foo | = { bar, baz } var foo = bar // baz var foo = | bar //var foo = bar baz var foo = bar * baz | var foo = bar + baz var foo = bar.baz var foo | = bar(baz) | | but also unreasonable ones like var abc = | 123 | | and therefore a parser that can handle malformed inputs | has to make educated guesses what the input was actually | supposed to look like. And don't be fooled by this simple | example, imagine a long source file with deeply nested | code in a language with curly braces and randomly | deleting some of the braces. Now try to figure out where | classes, methods, if or try statements begin and end in | order to produce a [partial] syntax tree better than just | giving up at the position of the first error. | jhgb wrote: | My point was that test suites should give you a heuristic | on what corrections are good and which are bad. A source | code change that turns a test fail into a test pass | should be considered an improvement. | danbruc wrote: | I am still lost. Test suite for what? We have a parser - | binary, source code and maybe a test suite if the parser | developers decided to write tests - and a random text | file that we throw at the parser and for which the parser | hopefully generates a useful syntax tree if the content | is a well-formed or not too badly malformed program in a | language the parser understands. | jhgb wrote: | What "test suite for the parser"? Of course a test suite | for the faulty program you're trying to correct into a | working one. | danbruc wrote: | So I can only use the diff tool to compare two non- | compiling versions of a source file if I provide a test | suite for that file to the diff tool? And how would you | want to make use of the test suite? Before you can run | the test suite, the source file must already parse and | compile which is already more than a diff tool based on a | syntax tree requires - it must be able to parse the | source code but it doesn't have to compile. Passing the | test suite requires even more, not only being able to | parse and compile but also yield the correct behavior | which the diff tool doesn't care about. | | And you actually jumped over the hard part that requires | the heuristics, how to modify the input in order to make | it parse. Take a 10 kB source file and delete 10 random | characters - how will you figure out which characters to | put back where? With 100 possible characters, 10,000 | positions to insert a character, and having to insert 10 | characters, you are looking at something like 10^60 | possible modifications. You are certainly not going to | try them one after another, each time checking if the | modified source file parses, compiles, and passes the | test suite. | jhgb wrote: | > So I can only use the diff tool to compare two non- | compiling versions of a source file if I provide a test | suite for that file to the diff tool? | | Not sure what this whole straw man is about. I definitely | didn't suggest anything like that. Of course you can only | compare two _compiling_ versions of a source file using a | test-suite-based heuristics. I thought this whole thing | was about "heuristics that identify reasonable changes | which fix the file" mentioned above? "Reasonable changes | that DON'T fix the file" are clearly recognizable by NOT | passing the test suite, just as if it was a human trying | to make those changes and finding out that the change | that he just did didn't in fact yield the desired results | after running the test suite. | | > With 100 possible characters, 10,000 positions to | insert a character, and having to insert 10 characters, | you are looking at something like 10^60 possible | modifications. | | If you're working with an AST, you're almost certainly | not working with characters. That would be immensely | wasteful. In fact working with an AST is pretty much the | only way in which the set of changes is sufficiently | reduced for almost any change to NOT be rejected | outright. With character-level modifications, you're | facing the problem that almost every edit will be | outright rejected as early as at the stage of parsing. | mekster wrote: | > Because it is much easier, you don't have to build and | maintain parsers for hundreds of languages. | | Seems there's a good open market for such a lazy reason. | NateEag wrote: | This tool is built on tree-sitter (https://tree- | sitter.github.io/tree-sitter/), so presumably it doesn't need | to maintain parsers at all. | | I've thought before this is how diffing should be done, and | speculated that tree-sitter would make it more feasible. | | At this point, whenever I think some language-aware tool | ought to exist, my first thought is "Does the language server | protocol or tree-sitter make this more feasible?" | danbruc wrote: | Someone still has to build and maintain the parsers, you | are just outsourcing this. And I added a bit to my comment, | I tend to believe that parsing is the easy part, but that | is admittedly more a gut feeling and not based on any real | knowledge of that problem space. | NateEag wrote: | That's certainly a good point. | | Languages usually change slowly, though, so once a good | baseline grammar is in place, maintenance is unlikely to | be a huge load. | | Furthermore, with tools like tree-sitter and the language | server protocol, multiple communities benefit from their | continued existence, so there's a bigger pool of | contributors to the parser. | nanochad wrote: | Wilfred wrote: | It's really hard! :) | | (1) Parsing an arbitrary language is hard. Without tree-sitter, | difftastic would probably be a lisp-only tool. You also want a | parser that preserves comments. | | (2) Inputs may not be syntactically well formed. | | (3) Efficiently comparing trees is extremely difficult | (difftastic is O(N^2) in time and memory). | | (4) Displaying tree diffs is equally difficult. Alignment is | particularly challenging when your 'unchanged before' and | 'unchanged after' are syntactically the same, but textually | different. | aasasd wrote: | Personally I long for a syntactic merge-tool. Every time | Syncthing hiccups for some reason, I'm up for a merge session | with my Org-mode files, in the vein of: 'These properties look | just like those ones, only with a different timestamp... Oh | lookie, and the heading is totally changed. Let me merge this new | heading all over the old one, and then pop in the old one after | it.' Dammit, it's just a whole new heading added with properties. | This happens with every language heavy on markup. | | However, I'm not sure if Org markup lends itself to structuring | that would allow proper diffing--even with just the headings. | teknopaul wrote: | Be good to have different git merge strategies per file type. | | e.g. A merge that knows properties files support the same | property added in different places but only once is needed. And | another strategy if order is significant. | | Cool to have an HTML merge that recognises the tree structure | and supports merging tags and having the indentation follow | some rules. | | I believe git supports merge strategies, its been on my todo | list forever. | loxias wrote: | This looks really cool and I can't wait to try it, tho... a bit | of a PITA to get running. ;) Took a while to figure out how to | build, and had to install 400MB of dependencies first.... | | Edit: And after installing cargo, watching it fail to build, then | determining I must need a _newer_ version of cargo, so I built | that from source... it fails. Apparently I need to install | `rustc-mozilla` and not `rustc`. "obviously". | | This is all a testament to how much I want to try this tool... | | MOAR EDIT: even with rustc-mozilla cargo fails to build. running | `cargo install difftastic` gives me an error about my version of | cargo being too old ;.; | | Dear author: Let us run your tool. | Wilfred wrote: | The getting started section of the manual should help: | https://difftastic.wilfred.me.uk/getting_started.html | | I've documented the minimum rust version required today, | although I'm looking at lowering the minimum version. | gkfasdfasdf wrote: | Using ubuntu 20.04, I first installed cargo: | curl https://sh.rustup.rs -sSf | sh | | Restart shell to get $HOME/.cargo/bin in PATH, then did: | cargo install difftastic | | And ~4 minutes later, difft executable is ready. | | Agree though that some pre-built binaries would be fantastic! | loxias wrote: | Ah, well, if you're willing to accept having a frankensystem | with a mix of packaged and unpackaged software, sure. ;) I | used to do that, back in Slackware days. | | It's considered really sloppy and unmaintainable to admin a | system like that. Things quickly get out of hand. | | That strategy _does_ work if you isolate it to a chroot or a | container, but littering /usr/local with all sorts of locally | compiled upstream is just asking for future pain. Security | updates, library incompatibilities, &c. | | Prebuilt binaries might be nice, but I don't expect them for | random projects. (and I wouldn't have used them if offered) I | _do_ think it 's a reasonable expectation to be able to build | software w/o essentially setting up a new userland just for | that tool though. :) | gkfasdfasdf wrote: | The method I posted above doesn't write anything to | /usr/local. Root isn't required. Everything is written | under ~. | loxias wrote: | Whoa really? | | I'm sorry, and retract my ignorant assumption! Going to | try it out now. | Wilfred wrote: | There are a few packages available, e.g. | https://aur.archlinux.org/packages/difftastic and | https://pkgsrc.se/wip/difftastic. | | I've also had requests from Alpine Linux packagers to | allow dynamic linking to parsers. This is something I | want to support in future, once I'm happy with the basic | diffing logic. | jeremyjh wrote: | I agree it leads to problems but isn't the entire purpose | of `/usr/local` to be a dumping ground for locally | administered (unpackaged) programs? | YetAnotherNick wrote: | Used `cargo install difftastic`? Finished in a minute for me. | lopatin wrote: | Build errors for me. Apparently I'm on some nightly build of | cargo, but I need 2021 version. The pain begins... | | Edit: Reinstalling Cargo worked! | skywal_l wrote: | With rustup, it's pretty easy to update/change your cargo | version. | loxias wrote: | How did you do it? When I tried to rebuild cargo I got | build errors. I'm starting to suspect the only way to run | this tool is make a chroot tracking sid or something.... | lopatin wrote: | I just followed the installation instructions here: | https://doc.rust-lang.org/cargo/getting- | started/installation... | | It'll confirm that you want to install it, because it's | already installed I think, and I just selected 1. for | Yes. | loxias wrote: | > curl https://sh.rustup.rs -sSf | sh | | hard pass :) | a_passable_dev wrote: | Out of curiosity, what would be an acceptable way for the | developers to provide a quick way for users to get up and | running? | | A get started guide with all the required commands easily | copy-pastable? (A popular option these days) Something | else? | | I don't mean to be critical, I'm simply curious. | adwn wrote: | > _hard pass_ | | Why? You're willing to run some random open source | project, but you're not willing to run the official Rust | installation script? | loxias wrote: | Sure, but first I had to figure out wtf "cargo" is. :P | | Also, `cargo install difftastic` AIUI pulls it from a central | location, if I'm gonna poke at software for the first time, I | enjoy building it myself first, so I can get my hands dirty | in the source. :) | | EDIT: Also, the build fails. :( | | "error: unexpected token: `include_str` --> /home/loxias/.car | go/registry/src/github.com-1ecc6299db9ec823/radix- | heap-0.4.2/src/lib.rs:2:10 | 2 | #![doc = | include_str!("../README.md")] | ^^^^^^^^^^^ | | error: aborting due to previous error | | error: could not compile `radix-heap`. | | _sad trombone_ | Wilfred wrote: | This looks like you're using a version of Rust older than | the minimum required (1.56). | vlunkr wrote: | A huge part of the appeal of Rust and Go tools is that you can | just ship a binary, it's frustrating that it's not available | here. | ducktective wrote: | Same here. Looked into repo -> no binary in release or Github | actions | | spinned up a Ubuntu 18.04 instance -> git clone, git checkout | 0.24.0 | | installed rust using curl | sh method | | build fails: | | https://termbin.com/29xy | | removed the instance and gonna check it again 6 months later | adwn wrote: | In another comment you're asking about vim support. So let me | get this straight: You're using vim, yet you're unable to | resolve the error message = note: | /usr/bin/ld: cannot find Scrt1.o: No such file or directory | /usr/bin/ld: cannot find crti.o: No such file or directory | | Have you tried googling for "ubuntu crti.o: No such file or | directory" ? | joemi wrote: | Using vim has nothing to do with ones ability to | troubleshoot compiler/ubuntu issues. Plus both compiler and | ubuntu issues can be massive PITA to solve even if you're | familiar with them. Personally, if I'm trying to install | something on whim to try it out and I start getting "no | such file or directory" errors I'd be upset that something | is going wrong. | ducktective wrote: | >Have you tried googling for "ubuntu crti.o: No such file | or directory" ? | | Depending on the project, there is a certain threshold of | trying-to-make-something-work which I'm willing to | undertake in order to test an app. | | But you are right. I'm sorry if my OG comment may come | arrogant to the devs who do stuff for free. ( to the devs) | | [edit]: ok, I tried again, `sudo apt update && sudo apt | install build-essential` before installing rust and `cargo | install`ing. | | Error again: | | https://dpaste.com/FTG7FSRQF | estebank wrote: | Funnily enough, the error is in a C dependency providing | Haskell support. vendor/tree-sitter- | haskell-src/scanner.cc | goombacloud wrote: | For easy git usage I created these two scripts in my PATH instead | of using using git config: | | git-difft: #!/bin/sh | GIT_EXTERNAL_DIFF=difft git diff "$@" | | git-showt: #!/bin/sh | GIT_EXTERNAL_DIFF=difft git show --ext-diff "$@" | | Then you can run "git difft ..." or "git showt ..." if you want | to use it. | buu700 wrote: | For everyone wondering, it looks like this will work with git | diff: https://difftastic.wilfred.me.uk/git.html. | Starcrunch wrote: | Exactly what I was looking for. Thanks! | pvg wrote: | A previous discussion from 8 months ago, with some comments by | the author and authors of other diff tools: | | https://news.ycombinator.com/item?id=27768861 | dboreham wrote: | Finally. | 29athrowaway wrote: | Today in generation Z rediscovers things: semantic patching. | | https://en.wikipedia.org/wiki/Coccinelle_(software) | vcmiraldo wrote: | I really like the idea of focusing on producing patches for human | consumption. I studied the problem of merging AST-level patches | during my PhD (https://github.com/VictorCMiraldo/hdiff) and can | confirm: not simple! :) | narush wrote: | Can you give a little color on where the difficulties lie? Is | it an efficiency question, or is determining "which changes" | hard in the first place? | scythmic_waves wrote: | Not OP, but the docs call out some "Tricky Cases" [1]. | | [1] https://difftastic.wilfred.me.uk/tricky_cases.html | teeray wrote: | I'd imagine there's some challenging judgement calls that | such a tool would have to make. Like, in Go, you can | reorder the members of a struct definition. In many cases | this is just diff noise to reviewers. HOWEVER, it does | impact the layout of the struct in memory, so it can be | semantically meaningful in performance work. | gmfawcett wrote: | A wild nitpicker appears. I understand where you're | coming from & why this matters. But Go, the language | spec, doesn't make any guarantees about struct layout at | all. A layout difference may be meaningful, practically, | but it's potentially unreliable. | | e.g. see https://groups.google.com/g/golang- | nuts/c/1BlZDNBLiAM | | Having said that: if a Go compiler for a given | architecture decided to change its layout algorithm, I'm | pretty sure it would earn a changelog entry. | munk-a wrote: | PHP long stated that associative array sorting order was | unstable and not guaranteed (especially when the union | (+) operator or array_merge function were involved) - | that doesn't mean ten bazillion websites wouldn't | instantly break if they ever actually changed the | ordering to be unpredictable. | | Language designers need to content with the fact that the | ultimate final say in whether a thing is or not is | whether that behavior is observed. | zukzuk wrote: | I wrote a masters thesis about the more general problem | here (https://tspace.library.utoronto.ca/bitstream/1807/6 | 5616/11/Z...). | | The tl;dr is that there's an almost infinite number of | ways to atomize/conceptualize code into meaningful | "units" (to "register" it, in my supervisor's words), and | the most appropriate way to do that is largely | perspectival -- it depends on what you care about after | the fact, and there is no single maximal way to do it up | front. | vanderZwan wrote: | Early in the linked thesis there is a one-page argument about | the shortcomings of traditional approaches, which technically | isn't what you asked but might still answer the side of the | question that deals with human usage at least: | | https://victorcmiraldo.github.io/data/MiraldoPhD.pdf#page=24 | bool3max wrote: | Should've named that repo "phdiff". | pdimitar wrote: | Best pun I've heard in a long time. Well done. <3 | Groxx wrote: | I'll vote for "diphph" | wst_ wrote: | It's tangential but it reminded me of "lighght" poem by | Aram Saroyan. https://en.wikipedia.org/wiki/Aram_Saroyan#Mi | nimalism_and_co... | munk-a wrote: | To be pronounced "Doctor-iff" in speech? | einpoklum wrote: | So I looked at the paper and it seems interesting. Basic idea: | Instead of the operations to consider being "insert", "delete" | and "copy", one adds "reorder" "contract subtree" and | "duplicate" (although I didn't quite get the subtlety of copy | vs duplicate on a short skim); and even though extra ops | increase the search space, they actually let you search more | effectively. I can buy that argument. | | The practical problem, though, is that the Haskell compiler is | limited/buggy, so you couldn't implement this for C, and you | settled on a small language like Lua. If you _do_ extend this | to other languages (perhaps port your implementation from | Haskell to something else?), please post it on HN and | elsewhere! | arianvanp wrote: | Some of the GHC performance bugs that we ran into during the | research have been fixed as far as I know! Though I'd have to | double-check | pluc wrote: | good idea, but so dangerous though | mkdirp wrote: | Why? | pluc wrote: | because it's an automated piece of software making decisions | about what is an "equal diff" and what is a "difference diff" | because a diff no longer means just a change, it now has to | be a _meaningful enough change_.. If you removed something | like `if (true)` or whatever, that 's _still_ a diff that | could have some importance and /or unknown consequences. I | appreciate the value, but the fact that it allows refactoring | to be a non-diff would worry me in the long run I think. | Wilfred wrote: | Difftastic is only ignoring whitespace that isn't | significant. If you remove `if (true)`, it will get | highlighted. | | With a textual diff today, your only choices are 'highlight | all whitespace changes' (e.g the git default) or 'ignore | all whitespace' (e.g. diff --word-diff). | | If difftastic says there are no changes, then both files | have the same parse tree and the same comments. | gwbas1c wrote: | Now lets get a WASM build into Github. :) | thefaux wrote: | The nine year old inside me can't unsee the unfortunate choice of | names used in the basic example :) | AlexAndScripts wrote: | I would love it if version control stored an AST that also | includes comments and dividers (where right now we would leave an | empty line) and dev machines rendered it out however they wanted. | They could even change the language of keywords in addition to | normal formatting. | pie_flavor wrote: | This exact project is called JetBrains MPS. | adolph wrote: | MPS seems to be a DSL authoring tool. How would this be used | to make an AST diff tool? | | https://www.jetbrains.com/mps/ | | https://en.wikipedia.org/wiki/Abstract_syntax_tree | glenjamin wrote: | To do this requires some standard way of encoding an AST which | includes comments and dividers. | | That standard format is commonly known as source code - | although it lacks a normal form. | | Tools like prettier, gofmt and black can be thought of as a way | to produce a normal form of source code. | | This is (IMO) a reasonable incremental approach towards exactly | what you describe - if a project checks in only source code | that's formatted using a standardised format, then you're free | to work on it using whatever equivalent representation you like | - as long as it's converted back at commit-time. | Wilfred wrote: | FWIW VCS for Smalltalk basically does this. | | The challenge for a tool like difftastic is that I can't | guarantee that syntax is well-formed. You might be using new | syntax that my parser doesn't support, you might have merge | conflicts, or you might have a plain syntax error in your code. | | Tree-sitter handles parse errors gracefully, so difftastic | handles syntax errors pretty well in my experience. | tluyben2 wrote: | Yep, I posted this idea on Reddit recently and people said they | need a formatted syntax because of diff and version control; we | do not; get the ast, reformat in the editor as the particular | user fancies and generate diff and version control artefacts | also as a particular user sees fit. Our computers are very fast | so you can make a lot more different views on your code than we | have now by using the ast instead of text and regexps. | hyperpallium2 wrote: | BTW IIRC The tree version of levenshtein distance has (proven) | terrible complexity. But so does lcs, and diff itself performs | great in practice so maybe... | racl101 wrote: | Ok I really gotta try this. | pabs3 wrote: | A related thing is cregit, which does diffs of tokens: | | https://github.com/cregit/cregit https://lwn.net/Articles/698425/ | Wilfred wrote: | Ooh, I'd not seen this and I've seen a bunch of diff tools at | this point! Thanks for sharing. | ducktective wrote: | So how can one use this in vim? | sanity31415 wrote: | Unfortunately it's closed source, but | https://www.semanticmerge.com/ has been around for a few years | and works similarly, but can also merge. | oauea wrote: | I just spent a few minutes on that site and I can't even figure | out how to try it out, or their pricing, or anything other than | some very superficial docs, really. | | Is this just a pretty website, or is the software actually | available anywhere? | Pet_Ant wrote: | That pages is just the technology primer. The tools are XDiff | & XMerge: | | https://www.plasticscm.com/pricing | | Looks like no locally-run-binary/non-SaaS version. I was | hoping it'd have SublimeText like model. I have no interest | in trying to get my team to switch nor having to deal with | the security team when it turns out I was using a free cloud | account. | db48x wrote: | This is written by the same guy who wrote Helpful, an enhancement | package for the Emacs Help buffer. I highly recommend checking | out Helpful if you haven't seen it. | https://github.com/Wilfred/helpful | maw wrote: | He wrote https://github.com/Wilfred/deadgrep too. It's awesome | and I don't know how I lived without it for so long. | CodeIsTheEnd wrote: | EDIT: Wilfred IS the original author [3]; my apologies. | | Not to discredit Wilfred (it looks like he's taken over the | project as the maintainer), but, based on the historical | contributions [1], it looks like it was originally developed by | Max Brunsfeld, who also created Tree-sitter. [2] | | [1]: https://github.com/Wilfred/difftastic/graphs/contributors | | [2]: https://github.com/tree-sitter/tree-sitter | | [3]: | https://github.com/Wilfred/difftastic/commit/958033924a2dea7... | arxanas wrote: | I think the contributor graph is misleading, and that he's | using git-subtree to vendor tree-sitter, which makes it look | like others have contributed more to the project. | CodeIsTheEnd wrote: | Oops, I think you're right! Thank you for pointing that | out. | | My apologies to Wilfred. | disgruntledphd2 wrote: | Helfpul is (pun fully intended) so very, very helpful. | | Honestly, I cannot imagine going back to the standard emacs | help. | db48x wrote: | Agreed. It's so good it feels like it should have been that | way all along. For example, when you view the help for a | function Emacs has always given you a link to the source code | where that function is defined. Helpful shows you the source | code right in the Help buffer, and shows you a list of | callers, and gives you buttons that enable tracing or | debugging for the function. | | Once I discovered Helpful, all of those things seemed so | obviously useful that I can't understand why nobody else | thought to put them there, including myself. | disgruntledphd2 wrote: | The best part is the forget function, for when functions | are incompatible. As an example, lsp won't work for me | unless I forget the project-root function from ess-r (I | have no idea why this hasn't been fixed) and helpful makes | this a two or three key activity. | einpoklum wrote: | Checked out the repository. | | Build instructions? Nope. | | Minimum system requirements? Nope. But if you check out | cargo.toml, you'll see it says it needs Rust 1.56. | | My system has 1.48.0 . And it the latest Debian release! I don't | see how a diff tool can expect you to have a bleeding-edge | development environment. I mean, ok, you chose a new language - I | can understand that; I won't demand that it build with just a C | compiler and Make. But come on, this is not supposed to be just a | toy for new systems. | | Anyway, I still cloned it, tried to build with "cargo build", and | got stuck with: | | error: unexpected token: `include_str` | | it couldn't even tell me "get Rust 1.56" :-( | tzahifadida wrote: | how do i install on macbook to try? Can you give some | instructions in the getting started? | ryanianian wrote: | brew install rust cargo install difftastic | | Worked for me without any problems. | sebdufbeau wrote: | From https://difftastic.wilfred.me.uk/getting_started.html, | it's installed via Cargo, so if you already have Cargo | installed its straightforward, otherwise you can install it via | https://doc.rust-lang.org/cargo/getting-started/installation... | yboris wrote: | My favorite dev tool is _diff2html_ - a CLI that opens up your | browser with a rich diff. Pro tip: alias `diff` to the command so | you can launch it quickly ;) | | https://diff2html.xyz/ | Aicy wrote: | Looks really cool, but there was no instructions on how to | install it. | | I would recommend putting an installation guide in your readme, | and it being a full installation guide. | | I followed the link to your manual and then it told me to install | your tool using a tool called "cargo" with no reference on how to | install cargo. At this point I gave up. Lazy, maybe, but for a | convenience tool like this I want a convenient installation. | conradludgate wrote: | Cargo is Rust's build tool/package manager and can be installed | easily using rustup. But I would probably suggest the | difftastic maintainers add some prebuilt binaries to the | releases | | (I have an example workflow here if anyone from there is | interested https://github.com/conradludgate/wordle/blob/main/.g | ithub/wo...) | jwilk wrote: | What's rustup and how do I install it? | asicsp wrote: | See https://rustup.rs/ | gkfasdfasdf wrote: | This method worked for me. No root required. | https://news.ycombinator.com/item?id=30842720 | loxias wrote: | I think it's wonderful that there's an explosion of new | exciting languages, it can only improve the quality of all our | tools. I for one am looking forward to replacing my eons of | MATLAB experience with Julia. | | But I wish there was more of a convention in the F/OSS | community that if your software isn't written in something | _universal_ (C, C++, shell and _maybe_ python), then it also | comes with a container of all that 's necessary to run it. | | It's frustrating to pollute my nicely packaged managed system | with hundreds of locally installed python modules just to run | one tool. Or, in this case, backport and rebuild a _language | specific build tool_ simply to compile. :) | andai wrote: | >shell | | >universal | | * laughs in Windows, then cries * | loxias wrote: | I used to straddle the two worlds, maintained and supported | a multi-site AD domain _with_ AFS integration for user | $HOME and some sort of unholy LDAP /kerberos bridge for | login. About once every year or two I'll miss something | about the way Windows does things, compared to normal | (meaning "linux"). Like the NTFS permissions model, that's | cool. | | But it's just once a year :) And the last time I was deep | in windows was win7, whenever that was. I tried to use a | win10 machine and gave up. | | Besides, I thought the big new feature in modern windows | was that WSL improved to the point you can run unix tools! | ;) | Spivak wrote: | > and some sort of unholy LDAP/kerberos bridge for login | | It's really not that bad, the AD-IPA cross-forest trust | is really solid as is the native sssd-ad integration if | IPA is too much. Honestly I can't really imagine it any | other way now, so much work has been put into AD support | that it's actually the best login experience on Linux at | the moment. OpenLDAP is definitely showing its age -- | dgmr I use it for all my personal infra because it's free | and my use-cases are dead simple but we got to delete so | much bespoke code after migrating off it at work. | Wilfred wrote: | FWIW I've had reports of people using difftastic on Windows | successfully. | simonw wrote: | Have you used pipx? I really like it for installing Python | tools because it automatically creates a virtual environment | for them so that their dependencies don't affect anything | else. | | https://pypa.github.io/pipx/ | loxias wrote: | I agree with all your points. | | Only diff is I got to the point where it said I needed "cargo", | On a whim, I typed "aptitude install cargo", and it did | something. Now waiting for the >1GB source repo to clone to see | if it works.... ;) | childintime wrote: | Looks like you need to install the Rust programming language | and compile it. It worked for me. Not sure if I like the | installation method. It seems the executable is portable | though. | fortran77 wrote: | It supports Elixir and C#! Too bad it doesn't do Erlang and F# | | It looks very handy though. I still do a lot of C and C++ | rom1v wrote: | It might be useful for reviewing merge/pull requests. But is | there a way to display the diff "interleaved" instead of | 2-columns side-by-side? (when executing `GIT_EXTERNAL_DIFF=difft | git log -p --ext-diff` for example) | Wilfred wrote: | There's a basic single-column 'inline' display available if you | do `INLINE=y`, but it's not as mature as the side-by-side | display yet. | DarkPlayer wrote: | We are working on a code review tool which supports unified | diffs with semantic diffing. If that sounds interesting for | you, take a look at https://mergeboard.com | foreigner wrote: | I LOL'ed at the first page of the manual: "When it works, it's | fantastic." | Pet_Ant wrote: | I was interested in SemanticMerge/XMerge but when I looked they | didn't have a Mac clinet and now it looks like they don't have a | personal edition. I just want to buy a private license and use it | locally. https://semanticmerge.com | password4321 wrote: | They are requesting feedback on the pricing model for the | latest revision of the technology, maybe HN could change their | minds: | | https://www.gmaster.io/pricing | Pet_Ant wrote: | OS X and Linux are "wait & see" again. That describes half of | our dev team and most of the seniors. | kid64 wrote: | This is great. I previously used Code Compare by Devart for this | purpose, but it has been abandoned without support for modern | IDEs. | cjohansson wrote: | If you have consistent code style and formatting this tool is | unnecessary. I think that solution is better, you get a more | consistent code base that is easier to read for humans. (Also | diffs will be faster to compute) | mcculley wrote: | Even if you are consistent, having unchanged indented text show | up differently is very clever. I often end up reviewing a diff | that moves a basic block into a conditional branch and have to | scan each line to see if it changed. | jlokier wrote: | If you're using a language that doesn't depend on indentation | (C, Java, Go, Rust etc), try "diff -b" or "git diff -b". | | The indented basic block won't show as a difference, only the | start and end of the block. | hu3 wrote: | interesting. Is -b equivalent to -Xignore-all-space in git? | kortex wrote: | I run all python through `black` and `isort`; this is still a | huge step up in my book in terms of readability and ergonomics | compared to the standard `git diff` or gnu `diff`. | mkdirp wrote: | > If you have consistent code style and formatting this tool is | unnecessary | | I disagree. I struggle to replicate it right now using a simple | test, but I've seen the following rather infuriating and | counter intuitive behaviour from Git/GNU diff. If you have a | simple if statement such as: if (bla) { | // do something } | | And you were to add another statement at the end, after the | closing curly brace, e.g.: if (bla) { | // do something } if (bla2) { // | do something else } | | Git/GNU diff will sometimes show the following diff: | diff --git 1/left 2/right index c2ea6f1..dc0e1c2 100644 | --- 1/left +++ 2/right @@ -1,3 +1,6 @@ | if (bla) { // do something +} +if | (bla2) { + // do something else } | | This is basic example, but there's other similar things. For a | simple change like the above, this isn't a huge issue, but for | a bigger patch sets, it can take a minute to understand what is | really going on. | lolc wrote: | Right, I frequently get angry at just how dumb diff really | is. How it's greedy and can't recognize the best seams | between blocks of code. But then when I think of simple rules | that would improve the results, I see how they would lead to | other problems in other places. So using syntax seems | necessary. | hoseja wrote: | There _is_ an option [0] to use non-default but still built- | in git diff algorithms that might yield better results. | | [0] https://git-scm.com/docs/git-diff#Documentation/git- | diff.txt... | NateEag wrote: | I've used a few of the different git diff algorithms and | still have had problems like these. | cyberge99 wrote: | Nice tool! I've used icdiff for this in the terminal, but I'll | see how this performs in my workflow. | | Since I use VSCode as my editor, I created this oneliner in my | .bash_profile: | | # VS Code Diff | | diffcb () { "/usr/local/bin/code" -n --diff $@ > /dev/null 2>&1 ; | } | | With it, I can "diffcb filename1.json filename2.json" to get a | visual editor with contextual awareness based on installed lint | modules. | db48x wrote: | Yes. | rcthompson wrote: | What does it do for unsupported languages? Just fall back to | "regular" diff? | Wilfred wrote: | Yep! It does a conventional textual diff: run Myers' diff | algorithm on lines, then word highlighting on changed lines. | rcthompson wrote: | I wonder if it would be possible to do this in a one-column | format. That would make it more useful in a lot of contexts where | a super wide view isn't practical. ___________________________________________________________________ (page generated 2022-03-29 23:00 UTC)