[HN Gopher] Version Control for Structure Editing
       ___________________________________________________________________
        
       Version Control for Structure Editing
        
       Author : mepian
       Score  : 83 points
       Date   : 2021-10-19 19:03 UTC (3 hours ago)
        
 (HTM) web link (alarmingdevelopment.org)
 (TXT) w3m dump (alarmingdevelopment.org)
        
       | lewisjoe wrote:
       | The challenge with implementing this is dealing with half a dozen
       | types of operations or maybe more. In typical string OT/CRDT we
       | are dealing with a minimal set of operations (insert/delete) but
       | when it comes to a structure (= semantic trees) the ops are very
       | tailored for that semantic structure and could span and evolve
       | with the structure.
       | 
       | Even if we get the OT part right, it'd be huge effort to port
       | this to support other semantic structures with different set of
       | ops. Also I can't wrap my head around how transformations and
       | conflict detections work under these cases. Will watch out for
       | more from this project.
        
         | lewisjoe wrote:
         | Also, what happens if the structure was edited in some other
         | editor and you suddenly get two structures with no history to
         | compare against?
        
         | dgb23 wrote:
         | Haven't read the paper thoroughly yet, looking forward to it.
         | The idea here seems to be very type driven and I think there is
         | something to it.
         | 
         | The general goal reminds me of Unison[0], which takes a
         | different approach. It sees code as kind of a database where
         | the functions are immutable entries. So it is less granular,
         | but likely more semantic.
         | 
         | What I immediately thought of reading your comment is paredit.
         | I know of the Emacs mode[1] and the Calva VSCode plugin[2]. One
         | could work from there, see code evolution as collection of
         | structured editing units.
         | 
         | And then, some languages are extremely terse like APL or Forth.
         | Haven't yet found time to study them, but maybe their
         | representation and semantics are more suitable for this type of
         | thing?
         | 
         | But yeah, just text might just not be the right medium for code
         | in the first place. Not when we start thinking about what code
         | actually is.
         | 
         | We're manipulating structures indirectly by manipulating text.
         | Something is not right here... I know there have been many
         | attempts to move away from it, some are successful but only for
         | specific use-cases and I don't think anything succeeded in the
         | general purpose space. Maybe someone will succeed though. There
         | is no reason to believe otherwise. I feel like it would have to
         | be a very cross disciplinary collaboration. People who make
         | games, databases, art, science. Different perspectives to break
         | out of what we think programming is or should be.
         | 
         | I watched this talk[3] some months ago. One of the cool things
         | is the discussion near the end of the video at around 1h11m:
         | look what Sussman does, when he talks about stratification and
         | code structure - he closes his eyes. What is he seeing there?
         | He explains it sure, but he _sees_ something. That's what the
         | program _is_, not the text, not the bits and bytes. It's a
         | deeply connected, complex, flowing structure - I think they
         | talk about forests in there.
         | 
         | When we program, we manipulate this structure and the text we
         | write is kind of far away from the actual mental model we have.
         | Yes, I see code in my inner eye too, but that is when I think
         | about implementing it, or when I navigate actually written code
         | from memory. But it's not _the thing_.
         | 
         | [0] https://www.unisonweb.org/
         | 
         | [1] https://www.emacswiki.org/emacs/ParEdit
         | 
         | [2] https://calva.io/paredit/
         | 
         | [3] "Stratified Design: A Lisp Tradition"
         | https://www.youtube.com/watch?v=BoGb56k2txk
        
       | narush wrote:
       | I spent a while working on a generalized version control system
       | when I graduated two years ago. It was called Saga [1]. Saga -
       | get it? The name was the best bit.
       | 
       | It allowed you to specify a "file representation format," and
       | then used some messy 2d-and-above longest-common subsequence
       | matching algo [3] I can up with to diff the files, and merge them
       | if you wanted. It was a lovely learning experience I tried to
       | pass off as a startup, and got two of my friends involved as
       | cofounders.
       | 
       | From there, we tried to focus (generalized version control is
       | really hard... technically and otherwise), and pivoted to version
       | control for Excel spreadsheets. At one point we had branching and
       | merging working for XLSX files. But as we began to discover what
       | version of Excel customers used, things got a lot less fun. That
       | + lack of interest led to another pivot.
       | 
       | Anyways, for the past 1 year (just passed!) we've been building
       | Mito [3] with our learnings from all those spreadsheet folks we
       | spent time above. Mito is effectively a spreadsheet within your
       | Python environment. It's absolutely still getting off the ground,
       | but we're pretty proud of the value we're delivering to users
       | currently!
       | 
       | [1] https://github.com/saga-vcs/saga
       | 
       | [2] https://github.com/saga-
       | vcs/saga/blob/master/saga/base_file/...
       | 
       | [3] https://trymito.io/hn
        
       | a_c wrote:
       | At first glance I thought it was some kind of version control for
       | designing tool, like figma.
       | 
       | In my experience, the workflow between designers are highly
       | variable and the designs rarely reflect production fidelity. I am
       | hoping to have a tool to facilitate the collaboration between
       | visual/UI design and engineering. Anyway, am getting tangential
       | here
        
       | morelisp wrote:
       | Just a reminder that git stores _files_ , not _diffs_ , and you
       | can replace the merging strategy (e.g. how it handles multiple
       | heads), merge driver (e.g. word vs. line based merging), and
       | interactive diffing tool with anything you want. In this sense
       | git is purely concerned with _version control_ (what instance do
       | I have of this data and what is its provenance in regards to
       | other instances), and doesn 't really give a crap _how_ those
       | files got there.
       | 
       | I see a new structured editing project kicking off 3-4 times a
       | year and for some reason all of them seem to start by replacing
       | git. Thereby they immediately have to contend with storage,
       | branching, naming, and distribution, rather than using git as an
       | object store and focusing on their new editing algorithms.
       | 
       | (There are also very real workflow issues with the snapshot
       | model! But these structure editing projects don't try to address
       | those either.)
        
         | ftomassetti wrote:
         | True, indeed JetBrains MPS has its own git driver
        
       | gnufx wrote:
       | Darcs (and Pijul?) can support more patch types than textual
       | diffs, but I doubt much use has ever been made of that. I don't
       | know about the more general case, but it supports the extra type
       | now for identifier replacement, at least as basically s/x/y/g.
       | (One place where another type might be useful is changelogs, but
       | I never looked at what that might take.)
       | 
       | The Toolpack tool set for Fortran from the '80s was based around
       | parse trees and had a VCS, but I don't remember whether that
       | actually operated on trees or just text.
        
         | jayd16 wrote:
         | git can support different diff/merge tools. I just wish more of
         | gits configuration could be added to the repo itself. As it is,
         | if you needed a custom merge tool (like UnityYamlMerge) you
         | need each user to configure it separately.
         | 
         | The consequence is every contributor needs to know enough about
         | every file type in the repo to know if a custom merge tool
         | should be add/updated. You might get surprised with a merge
         | conflict in a filetype you never touched if you happen to be
         | the one merging down feature branches.
         | 
         | Hopefully some of this stuff and default client githooks are
         | fixed one day. Seems easy enough to add a "suggested project
         | config" to git.
        
       | escot wrote:
       | > Perhaps version control is actually the weak point of the
       | textual edifice, where we might be able to win a battle.
       | 
       | It would be interesting because as the paper says textual editing
       | has great deployment and collaboration tooling. So if non textual
       | could get a foothold in that exact area -- git -- it could draw a
       | ton of people who just want to get things shipped.
        
       | bob1029 wrote:
       | The answer for successfully applying VCS to higher-dimensional
       | spaces will demand more mathematically-elegant intermediate
       | representations. Most source code files are highly structured by
       | default. Image files are mostly feasible to diff as-is. Typical
       | 3d models, not so much. 3d models _with_ animation, even less so.
       | 
       | To be clear - the problem isn't that we cant detect a difference,
       | it's that we cannot produce a useful view of the difference such
       | that a human can make an informed decision. With
       | images/audio/code, you can still extract useful knowledge as long
       | as you know the shape of the difference relative to the whole,
       | even if the difference itself is a meaningless mesh of colors
       | between 2 image files.
       | 
       | Writing a _useful_ diff engine for 3d models represented using
       | constructive solid geometry would probably be substantially
       | easier than with other approaches. I don 't know if CSG is
       | actually constrained to 3 dimensions either... I feel like GitHub
       | actually tried to do something like this but I don't know if it
       | went very far.
        
         | bob1029 wrote:
         | Here is the GH blog post I'm thinking of from 2013:
         | 
         | https://github.blog/2013-09-17-3d-file-diffs/
        
       | la4ry wrote:
       | Of historical interest was Interlisp-D as a system that did
       | structure editing and version management. it was at the beginning
       | of time so getting it to work again as a practical development
       | environment is a lot of work.
       | 
       | https://github.com/Interlisp/medley/issues/533
        
       | shrimpx wrote:
       | Since the beginning of computer time people have been working on
       | structure editing, because academically it's very compelling, yet
       | in practice text wins out over and over. That said, there's
       | probably a lot of opportunity to have "structure under the hood",
       | but that's kind of a moot point in general because that's what
       | linters, compilers, etc., are.
       | 
       | But maybe his specific point about structural diffs is salient;
       | that maybe there are huge wins in structural diffs that we
       | haven't tapped into for some reason. Again, there are decades of
       | research in structural diffing, so where's the impact?
        
         | [deleted]
        
         | avindroth wrote:
         | It works well with lisps at the very least
        
       | hardwaregeek wrote:
       | Ooh this is exactly what I've been thinking about. Text is such a
       | slow, clunky medium. It'd be interesting if you could think of
       | versions as events modifying a tree. Renaming a variable and
       | inserting a character would both be an event. Also I wonder if
       | structural editing will take over. IDEs are already so powerful
       | that if you could create good keybindings, you could do so much
       | with just IDE commands (generate expr, rename var, swap args,
       | etc.). Then if your editor knows that it will always keep a valid
       | AST, what can you do with your tooling?
        
         | solarkraft wrote:
         | I really, really hope so.
         | 
         | Text is so clunky, especially in languages with superfluous
         | syntax (semicolon, braces). My tree based outliner allows me to
         | easily rearrange arbitrarily large blocks while never creating
         | invalid syntax, why the heck doesn't my IDE? Code is just a
         | damn tree. Why can't I arbitrarily choose to comment out/in
         | code without breaking basically all the IDE tooling (collapsed
         | a block? Well too bad!!)?
         | 
         | We should _never_ have to think about syntax. Yet we (or
         | certainly I) do a significant portion of the time.
         | 
         | The stuff I'm thinking of should be fairly possible to do as a
         | Vscodium/VSCode plugin. Can somebody please tell me it's
         | already being done?
        
           | layer8 wrote:
           | Does it really make much of a difference whether you press an
           | end-of-statement keyboard shortcut vs. typing a semicolon?
           | 
           | Having the latter as part of the source code is more
           | explicit, similar to LaTex vs. invisible formatting marks in
           | a word processor.
        
           | ModernMech wrote:
           | Those semicolons are redundant but not superfluous. Here are
           | some good reasons why you might want to keep them around even
           | in they aren't strictly necessary in parsing your program.
           | 
           | https://digitalmars.com/articles/b05.html
        
         | layer8 wrote:
         | I don't think that always keeping a valid AST is important.
         | Realtime highlighting of syntax errors already resumes parsing
         | after invalid code, usually mapping to error nodes internally.
         | That is, you still have an AST, just with additional node
         | types. Having an interim state with error nodes isn't really
         | different from having intermediate states with temporary
         | (possibly large) changes in valid code, e.g. where you
         | move/cut/paste larger portions of code around, and then maybe
         | decide to change it back (or just change back some parts).
         | Creating a sensible history of AST operations doesn't really
         | depend on whether you have error nodes in your AST grammar or
         | not.
         | 
         | On the other hand, allowing error nodes (i.e. invalid code) at
         | least as an intermediate state arguably allows more freedom and
         | creativity when editing code, and feels less coercive. It is
         | also unavoidable in certain contexts, such as while typing an
         | identifier, the identifier may be invalid in most intermediate
         | states until you have finished typing it.
         | 
         | Therefore I'm unconvinced that restricting editing to valid
         | ASTs is (a) critical to collaborative editing and versioning,
         | and (b) strictly desirable from a usability perspective.
        
       | zwieback wrote:
       | Super interesting. Instead of going whole-hog, could we add some
       | kind of hinting system to existing text-based systems that would
       | make structural changes known to the VCS? Maybe also make it
       | clear what's a comment or other insignificant change so that the
       | important changes can be tracked separately?
        
       ___________________________________________________________________
       (page generated 2021-10-19 23:00 UTC)