[HN Gopher] Version Control for Structure Editing ___________________________________________________________________ Version Control for Structure Editing Author : mepian Score : 83 points Date : 2021-10-19 19:03 UTC (3 hours ago) (HTM) web link (alarmingdevelopment.org) (TXT) w3m dump (alarmingdevelopment.org) | lewisjoe wrote: | The challenge with implementing this is dealing with half a dozen | types of operations or maybe more. In typical string OT/CRDT we | are dealing with a minimal set of operations (insert/delete) but | when it comes to a structure (= semantic trees) the ops are very | tailored for that semantic structure and could span and evolve | with the structure. | | Even if we get the OT part right, it'd be huge effort to port | this to support other semantic structures with different set of | ops. Also I can't wrap my head around how transformations and | conflict detections work under these cases. Will watch out for | more from this project. | lewisjoe wrote: | Also, what happens if the structure was edited in some other | editor and you suddenly get two structures with no history to | compare against? | dgb23 wrote: | Haven't read the paper thoroughly yet, looking forward to it. | The idea here seems to be very type driven and I think there is | something to it. | | The general goal reminds me of Unison[0], which takes a | different approach. It sees code as kind of a database where | the functions are immutable entries. So it is less granular, | but likely more semantic. | | What I immediately thought of reading your comment is paredit. | I know of the Emacs mode[1] and the Calva VSCode plugin[2]. One | could work from there, see code evolution as collection of | structured editing units. | | And then, some languages are extremely terse like APL or Forth. | Haven't yet found time to study them, but maybe their | representation and semantics are more suitable for this type of | thing? | | But yeah, just text might just not be the right medium for code | in the first place. Not when we start thinking about what code | actually is. | | We're manipulating structures indirectly by manipulating text. | Something is not right here... I know there have been many | attempts to move away from it, some are successful but only for | specific use-cases and I don't think anything succeeded in the | general purpose space. Maybe someone will succeed though. There | is no reason to believe otherwise. I feel like it would have to | be a very cross disciplinary collaboration. People who make | games, databases, art, science. Different perspectives to break | out of what we think programming is or should be. | | I watched this talk[3] some months ago. One of the cool things | is the discussion near the end of the video at around 1h11m: | look what Sussman does, when he talks about stratification and | code structure - he closes his eyes. What is he seeing there? | He explains it sure, but he _sees_ something. That's what the | program _is_, not the text, not the bits and bytes. It's a | deeply connected, complex, flowing structure - I think they | talk about forests in there. | | When we program, we manipulate this structure and the text we | write is kind of far away from the actual mental model we have. | Yes, I see code in my inner eye too, but that is when I think | about implementing it, or when I navigate actually written code | from memory. But it's not _the thing_. | | [0] https://www.unisonweb.org/ | | [1] https://www.emacswiki.org/emacs/ParEdit | | [2] https://calva.io/paredit/ | | [3] "Stratified Design: A Lisp Tradition" | https://www.youtube.com/watch?v=BoGb56k2txk | narush wrote: | I spent a while working on a generalized version control system | when I graduated two years ago. It was called Saga [1]. Saga - | get it? The name was the best bit. | | It allowed you to specify a "file representation format," and | then used some messy 2d-and-above longest-common subsequence | matching algo [3] I can up with to diff the files, and merge them | if you wanted. It was a lovely learning experience I tried to | pass off as a startup, and got two of my friends involved as | cofounders. | | From there, we tried to focus (generalized version control is | really hard... technically and otherwise), and pivoted to version | control for Excel spreadsheets. At one point we had branching and | merging working for XLSX files. But as we began to discover what | version of Excel customers used, things got a lot less fun. That | + lack of interest led to another pivot. | | Anyways, for the past 1 year (just passed!) we've been building | Mito [3] with our learnings from all those spreadsheet folks we | spent time above. Mito is effectively a spreadsheet within your | Python environment. It's absolutely still getting off the ground, | but we're pretty proud of the value we're delivering to users | currently! | | [1] https://github.com/saga-vcs/saga | | [2] https://github.com/saga- | vcs/saga/blob/master/saga/base_file/... | | [3] https://trymito.io/hn | a_c wrote: | At first glance I thought it was some kind of version control for | designing tool, like figma. | | In my experience, the workflow between designers are highly | variable and the designs rarely reflect production fidelity. I am | hoping to have a tool to facilitate the collaboration between | visual/UI design and engineering. Anyway, am getting tangential | here | morelisp wrote: | Just a reminder that git stores _files_ , not _diffs_ , and you | can replace the merging strategy (e.g. how it handles multiple | heads), merge driver (e.g. word vs. line based merging), and | interactive diffing tool with anything you want. In this sense | git is purely concerned with _version control_ (what instance do | I have of this data and what is its provenance in regards to | other instances), and doesn 't really give a crap _how_ those | files got there. | | I see a new structured editing project kicking off 3-4 times a | year and for some reason all of them seem to start by replacing | git. Thereby they immediately have to contend with storage, | branching, naming, and distribution, rather than using git as an | object store and focusing on their new editing algorithms. | | (There are also very real workflow issues with the snapshot | model! But these structure editing projects don't try to address | those either.) | ftomassetti wrote: | True, indeed JetBrains MPS has its own git driver | gnufx wrote: | Darcs (and Pijul?) can support more patch types than textual | diffs, but I doubt much use has ever been made of that. I don't | know about the more general case, but it supports the extra type | now for identifier replacement, at least as basically s/x/y/g. | (One place where another type might be useful is changelogs, but | I never looked at what that might take.) | | The Toolpack tool set for Fortran from the '80s was based around | parse trees and had a VCS, but I don't remember whether that | actually operated on trees or just text. | jayd16 wrote: | git can support different diff/merge tools. I just wish more of | gits configuration could be added to the repo itself. As it is, | if you needed a custom merge tool (like UnityYamlMerge) you | need each user to configure it separately. | | The consequence is every contributor needs to know enough about | every file type in the repo to know if a custom merge tool | should be add/updated. You might get surprised with a merge | conflict in a filetype you never touched if you happen to be | the one merging down feature branches. | | Hopefully some of this stuff and default client githooks are | fixed one day. Seems easy enough to add a "suggested project | config" to git. | escot wrote: | > Perhaps version control is actually the weak point of the | textual edifice, where we might be able to win a battle. | | It would be interesting because as the paper says textual editing | has great deployment and collaboration tooling. So if non textual | could get a foothold in that exact area -- git -- it could draw a | ton of people who just want to get things shipped. | bob1029 wrote: | The answer for successfully applying VCS to higher-dimensional | spaces will demand more mathematically-elegant intermediate | representations. Most source code files are highly structured by | default. Image files are mostly feasible to diff as-is. Typical | 3d models, not so much. 3d models _with_ animation, even less so. | | To be clear - the problem isn't that we cant detect a difference, | it's that we cannot produce a useful view of the difference such | that a human can make an informed decision. With | images/audio/code, you can still extract useful knowledge as long | as you know the shape of the difference relative to the whole, | even if the difference itself is a meaningless mesh of colors | between 2 image files. | | Writing a _useful_ diff engine for 3d models represented using | constructive solid geometry would probably be substantially | easier than with other approaches. I don 't know if CSG is | actually constrained to 3 dimensions either... I feel like GitHub | actually tried to do something like this but I don't know if it | went very far. | bob1029 wrote: | Here is the GH blog post I'm thinking of from 2013: | | https://github.blog/2013-09-17-3d-file-diffs/ | la4ry wrote: | Of historical interest was Interlisp-D as a system that did | structure editing and version management. it was at the beginning | of time so getting it to work again as a practical development | environment is a lot of work. | | https://github.com/Interlisp/medley/issues/533 | shrimpx wrote: | Since the beginning of computer time people have been working on | structure editing, because academically it's very compelling, yet | in practice text wins out over and over. That said, there's | probably a lot of opportunity to have "structure under the hood", | but that's kind of a moot point in general because that's what | linters, compilers, etc., are. | | But maybe his specific point about structural diffs is salient; | that maybe there are huge wins in structural diffs that we | haven't tapped into for some reason. Again, there are decades of | research in structural diffing, so where's the impact? | [deleted] | avindroth wrote: | It works well with lisps at the very least | hardwaregeek wrote: | Ooh this is exactly what I've been thinking about. Text is such a | slow, clunky medium. It'd be interesting if you could think of | versions as events modifying a tree. Renaming a variable and | inserting a character would both be an event. Also I wonder if | structural editing will take over. IDEs are already so powerful | that if you could create good keybindings, you could do so much | with just IDE commands (generate expr, rename var, swap args, | etc.). Then if your editor knows that it will always keep a valid | AST, what can you do with your tooling? | solarkraft wrote: | I really, really hope so. | | Text is so clunky, especially in languages with superfluous | syntax (semicolon, braces). My tree based outliner allows me to | easily rearrange arbitrarily large blocks while never creating | invalid syntax, why the heck doesn't my IDE? Code is just a | damn tree. Why can't I arbitrarily choose to comment out/in | code without breaking basically all the IDE tooling (collapsed | a block? Well too bad!!)? | | We should _never_ have to think about syntax. Yet we (or | certainly I) do a significant portion of the time. | | The stuff I'm thinking of should be fairly possible to do as a | Vscodium/VSCode plugin. Can somebody please tell me it's | already being done? | layer8 wrote: | Does it really make much of a difference whether you press an | end-of-statement keyboard shortcut vs. typing a semicolon? | | Having the latter as part of the source code is more | explicit, similar to LaTex vs. invisible formatting marks in | a word processor. | ModernMech wrote: | Those semicolons are redundant but not superfluous. Here are | some good reasons why you might want to keep them around even | in they aren't strictly necessary in parsing your program. | | https://digitalmars.com/articles/b05.html | layer8 wrote: | I don't think that always keeping a valid AST is important. | Realtime highlighting of syntax errors already resumes parsing | after invalid code, usually mapping to error nodes internally. | That is, you still have an AST, just with additional node | types. Having an interim state with error nodes isn't really | different from having intermediate states with temporary | (possibly large) changes in valid code, e.g. where you | move/cut/paste larger portions of code around, and then maybe | decide to change it back (or just change back some parts). | Creating a sensible history of AST operations doesn't really | depend on whether you have error nodes in your AST grammar or | not. | | On the other hand, allowing error nodes (i.e. invalid code) at | least as an intermediate state arguably allows more freedom and | creativity when editing code, and feels less coercive. It is | also unavoidable in certain contexts, such as while typing an | identifier, the identifier may be invalid in most intermediate | states until you have finished typing it. | | Therefore I'm unconvinced that restricting editing to valid | ASTs is (a) critical to collaborative editing and versioning, | and (b) strictly desirable from a usability perspective. | zwieback wrote: | Super interesting. Instead of going whole-hog, could we add some | kind of hinting system to existing text-based systems that would | make structural changes known to the VCS? Maybe also make it | clear what's a comment or other insignificant change so that the | important changes can be tracked separately? ___________________________________________________________________ (page generated 2021-10-19 23:00 UTC)