[HN Gopher] Fast file synchronization and network forwarding for...
       ___________________________________________________________________
        
       Fast file synchronization and network forwarding for remote
       development
        
       Author : saikatsg
       Score  : 73 points
       Date   : 2022-10-16 18:01 UTC (4 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | solarkraft wrote:
       | If only macOS supported mounting via SSHFS ...
        
       | parasti wrote:
       | Mutagen also has a Docker extension. Really easy to set up. I
       | installed it recently after searching for ways to speed up Docker
       | on an Apple M1. It did work in my case.
        
       | emrah wrote:
       | What is the benefit over rsync which is the perfect tool for this
       | at the moment? Maybe add an faq section to the readme for
       | questions like this?
        
         | xenoscopic wrote:
         | The primary benefits:
         | 
         | - Mutagen performs bidirectional synchronization (though it can
         | also operate unidirectionally); rsync is unidirectional
         | 
         | - Mutagen uses recursive filesystem watching to avoid full
         | filesystem rescans (whereas rsync always does a full filesystem
         | rescan). This allows Mutagen to provide a more "real time"
         | sync.
         | 
         | - Mutagen has an active synchronization loop that doesn't
         | require manual invocation.
         | 
         | - Mutagen has more idiomatic Windows support.
         | 
         | - Mutagen doesn't require that it be pre-installed on both
         | endpoints.
         | 
         | Both use differential transfers (i.e. the "rsync algorithm")
         | for transferring individual files.
         | 
         | There are other differences, of course, as well as
         | similarities. Mutagen's design is tuned for development work,
         | rsync's design is tuned for replication. I still use rsync for
         | archival operations on a daily basis - it's great!
        
       | fpoling wrote:
       | In past I have used lsyncd to develop locally and synchronize the
       | changes to a remote host over ssh where the code base was
       | compiled. This worked nicely even over GPRS network connection
       | with a speed like 30 Kbit/s. As the link had high latency it was
       | important to use emacs shell for the remote connection. This way
       | I could type the command locally and send it to the remote host
       | when pressing enter.
        
       | cube2222 wrote:
       | We've been using Mutagen extensively for remote development with
       | an EC2 instance hosting a docker-compose with a couple of
       | services and live rebuild+reload, and it's been working
       | fantastic.
       | 
       | It's also nice for automatically managing port forwards.
        
       | ta988 wrote:
       | I've been using mutagen for over 6 months now to sync over an M1
       | Linux VM. The only thing I miss is an option that would say
       | "force everything from A" or "force everything from B" I've had
       | rare cases where there were conflicts that I only could resolve
       | by pausing mutagen and running rsync. But I appreciate that
       | mutagen warns you and just doesn't overwrite silently like
       | syncthing can do sometimes.
        
         | ithkuil wrote:
         | Mutagen allows to choose a replication mode:
         | https://mutagen.io/documentation/synchronization
         | 
         | Do you want something different from the "one-way-replica"
         | mode?
        
       | ajvs wrote:
       | How does it compare to syncthing?
        
       | jedisct1 wrote:
       | Super useful tool!
       | 
       | Plus, it's multi platform. I'm using it to synchronize
       | directories between hosts running macOS, OpenBSD and Linux.
       | Everything works fine.
       | 
       | I haven't tried the Docker Desktop extension since I switched to
       | Colima (Docker Desktop is constantly broken on Apple Silicon).
        
       | Naac wrote:
       | I haven't found anything better than using Unison. Maybe the
       | linked README could compare prior art?
        
         | xenoscopic wrote:
         | Conceptually speaking, Mutagen and Unison are very similar (and
         | actually I mentioned Benjamin Pierce's work in another comment
         | here asking about the sync algorithm - fantastic stuff!). I
         | tend to avoid direct comparisons because they always come
         | across one-sided, but some cursory differences:
         | 
         | - Mutagen tries to integrate recursive filesystem watching very
         | tightly into its synchronization loop to drive synchronization
         | and allow for near-instant filesystem rescans
         | 
         | - Mutagen automatically copies an "agent" binary to remote
         | systems to support synchronization, so no remote install is
         | required
         | 
         | - Mutagen uses Protocol Buffers for its data storage, so
         | synchronization sessions created with older versions continue
         | to work with newer versions
         | 
         | - Mutagen written in Go, Unison in OCaml (which allows Mutagen
         | broader platform support "for free")
         | 
         | - Mutagen tries to treat Windows as a first-class citizen
         | 
         | - Mutagen uses race-free traversal (e.g. openat, fstatat,
         | unlinkat, etc.) to perform operations
         | 
         | Obviously the internal implementations are different, but both
         | use differential (rsync-style) file transfers, both use the
         | same reconciliation concepts, etc.
         | 
         | Mutagen has the advantage of Go, recursive filesystem watching,
         | and modern POSIX/Windows APIs that didn't exist when Unison was
         | originally written, though some of that functionality has been
         | brought into Unison.
         | 
         | For a comparison with Syncthing (and to some extent Unison),
         | check out this comment[0].
         | 
         | [0]: https://news.ycombinator.com/item?id=30966448
        
       | karamanolev wrote:
       | This sounds like my dream tool - I've always loved how quickly
       | and well local tools work and remote environments cut into that
       | good experience significantly. For me to be productive, I really
       | need an instant feedback loop where tools work fast and I can
       | immediately experience the result of some small piece of work.
       | 
       | Has anyone tried this for a real-world project and can share
       | feedback?
        
         | grogenaut wrote:
         | I generally find systems that aren't setup to let you dev
         | locally and require a dev in prod or remote also don't let you
         | work in tiny tight feedback loops either. I generally focus
         | making it work everywhere the same instead of fast sync but
         | that's just me. Well and the systems I have control over.
        
         | ta988 wrote:
         | Yes it is excellent, syncing macos (Jetbrains tools and a few
         | other things) with a Linux VM .
        
         | cassianoleal wrote:
         | I find that VS Code's Remote-* extensions work well. I'm
         | currently writing a Terraform provider on a remote Linux box
         | using Remote-SSH and everything feels local. Compilation, etc
         | happens on the remote and if I were serving requests it's dead
         | easy to forward a port.
        
           | fpoling wrote:
           | Mutagen tries to be secure so in principle one can develop on
           | untrusted remote machine. VSCode remote always assumes that
           | the remote part is trusted.
        
             | cassianoleal wrote:
             | That sounds interesting but I can't find any mention to it
             | in the docs. In fact, it sounds like it's just copying
             | files over to the remote and running commands there.
             | 
             | Are you able to provide a reference to how Mutagen secures
             | my code on an untrusted remote?
        
               | xenoscopic wrote:
               | The general philosophy with Mutagen is to (a) delegate
               | encryption to other tools and (b) use secure defaults
               | (especially for permissions).
               | 
               | So, for example, Mutagen doesn't implement any
               | encryption, instead relying on transports like OpenSSH to
               | provide the underlying transport encryption. In the
               | Docker case, Mutagen does rely on the user securing the
               | Docker transport if using TCP, but works to make this
               | clear in the docs, and Mutagen is generally using the
               | Docker Unix Domain Socket transport anyway. When
               | communicating with itself, Mutagen also only uses secure
               | Unix Domain Sockets and Windows Named Pipes.
               | 
               | When it comes to permissions, Mutagen doesn't do a
               | blanket transfer of file ownership and permissions.
               | Ownership defaults to the user under which the mutagen-
               | agent binary is operating and permissions default to
               | 0700/0600. The only permission bits that Mutagen
               | transfers are executability bits, and only to entities
               | with a corresponding read bit set. The idea is that
               | synchronizing files to a remote, multi-user system
               | shouldn't automatically expose your files to everyone on
               | that system. These settings can be tweaked, of course,
               | and in certain cases (specifically the Docker Desktop
               | extension), broader permissions are used by default to
               | emulate the behavior of the existing virtual filesystems
               | that Mutagen is replacing.
        
       | AnthonBerg wrote:
       | I'd like to know more about the theory behind the synchronisation
       | -- how the syncing is known to be safe and non-destructive.
        
         | xenoscopic wrote:
         | The synchronization uses a repeated three-way merge algorithm,
         | very similar to Git's merge when merging branches. It is
         | triggered by recursive filesystem watching, which is also used
         | to accelerate filesystem rescans. It maintains a virtual most-
         | recent-ancestor and uses the two synchronization endpoints as
         | the "branches" being merged. Much like Git has "-X ours" and
         | "-X theirs" options, Mutagen also has automated conflict
         | resolution[0] modes that can be specified. You can find the
         | reconciliation algorithm here[1] (and there are an exhaustive
         | set of test cases in the corresponding _test.go file).
         | 
         | To avoid a large class of race conditions (at least to the
         | extent possible allowed by POSIX and Windows), Mutagen will use
         | `*at` style system calls for all filesystem traversal on POSIX
         | systems, with a similar strategy on Windows.
         | 
         | Also, to avoid race conditions due to filesystem changes
         | between scan time and change-application time, Mutagen will
         | perform just-in-time checks that filesystem contents haven't
         | changed from what was fed into the reconciliation algorithm.
         | 
         | [0]: https://mutagen.io/documentation/synchronization#modes
         | [1]: https://github.com/mutagen-
         | io/mutagen/blob/master/pkg/synchr...
        
           | xenoscopic wrote:
           | Also, while Mutagen's exact implementation is novel in a
           | number of ways, I would be remiss to not point out that huge
           | amount of academic work in this field was done by Benjamin
           | Pierce[0] and later implemented in Unison[1].
           | 
           | [0]: https://www.cis.upenn.edu/~bcpierce/papers/index.shtml#S
           | ynch... [1]: https://www.cis.upenn.edu/~bcpierce/unison/
        
             | liketochill wrote:
             | I've been using unison for what feels like 14 years. Once
             | working it was great but it always took me a while to
             | figure out the exact command line options I wanted.
             | Beautiful tool.
        
             | AnthonBerg wrote:
             | Thank you so much for the great replies!
        
       | xani_ wrote:
       | How's that compared to sshfs (wth cache/kernel_cache enabled) ?
       | I've used it few times where I had need to dev like that and it
       | was generally just fine for just editing a file, where
       | performance tanked was doing a lot of file I/O at once (say
       | updating git repo)
        
         | xenoscopic wrote:
         | The benchmarks will likely be highly dependent on your use
         | case, but SSHFS-style virtual filesystems (specifically those
         | backed by FUSE) typically have significantly lower performance
         | than something like an APFS/ext4/NTFS filesystem that Mutagen
         | could target with synchronization.
         | 
         | All of your readdir()/stat()/open()/read()-style calls will
         | suffer significantly on virtual filesystems, and unfortunately
         | these get hit a lot by things like IDEs (e.g. when indexing
         | code), compilers, and dynamic language runtimes (especially
         | PHP).
         | 
         | No tool is at fault in this chain, of course, it's a hard
         | problem. Mutagen is able to offer better performance by being a
         | little less dynamic and creating "real" copies of all the files
         | on a more persistent filesystem.
        
         | ta988 wrote:
         | Advantage of mutagen is that it works on OSes that can't do
         | sshfs. It felt faster too especially with a lot of IOs like
         | node modules or other things that touch a lot of files. But I
         | never ran a benchmark , it is so much faster by at least a
         | factor 10 than whatever is in docker desktop when populating
         | node modules that I don't even need a benchmark.
        
       | xenoscopic wrote:
       | Mutagen author here -- happy to answer any questions about
       | Mutagen[0], its Docker Desktop extension[1], its Compose
       | integration[2], or anything else!
       | 
       | [0]: https://mutagen.io/ [1]:
       | https://mutagen.io/documentation/docker-desktop-extension [2]:
       | https://mutagen.io/documentation/orchestration/compose
        
       | notemaker wrote:
       | Any user stories with *vim + mutagen for _large_ remote code
       | bases? Vs code remote is the only thing that has been fast enough
       | in my experience, but I would love to be able to use my local
       | neovim instance for remote development instead and this tool
       | looks promising.
        
         | xenoscopic wrote:
         | It should work fine. Many users use Mutagen on multi-GB
         | codebases. If we're talking something larger (say 10s of GBs or
         | TB-sized monorepos), then there are some tweaks you can do to
         | make life with Mutagen a little easier. Feel free to reach out
         | to jacob[-at-]mutagen.io if you have a specific use case, or
         | pop over to the Mutagen Community Slack Workspace[0] to chat.
         | 
         | [0]: https://mutagen.io/slack
        
       | eddyg wrote:
       | This sounds useful. But one question that comes to mind right
       | away:
       | 
       | Does Mutagen handle the case where "local tools" (running on a
       | completely different architecture than the remote) still need to
       | "know" about include/header/library/etc. files from the _remote_
       | machine in order to provide working "intelligence" capabilities?
       | 
       | It's one thing to efficiently sync "code", but it's another to
       | make local tools fully-aware of the remote system's header files,
       | libraries, etc.
        
         | xenoscopic wrote:
         | On the synchronization front, Mutagen's only goal is to
         | facilitate the synchronization of files (albeit with a focus on
         | development-related settings and low-latency for a "real time"
         | feel). It doesn't attempt to integrate with any higher-level
         | tooling (except in the cases of Docker Desktop and Compose,
         | which is facilitated via external projects). That sort of
         | tooling, language, and framework-specific integration is a bit
         | outside the project's target scope (and something that becomes
         | very domain-specific).
         | 
         | Mutagen will, however, happily operate between different
         | operating systems and architectures, so things like working
         | with a remote amd64-based Docker engine from your local
         | arm64-based laptop are totally possible.
         | 
         | Also, several external projects (such as DDEV[0] and Garden[1])
         | do use Mutagen as a low-level component in their stack to
         | provide synchronization that does "know" a bit more about the
         | framework that you're using.
         | 
         | [0]: https://ddev.com/ [1]: https://garden.io/
        
       ___________________________________________________________________
       (page generated 2022-10-16 23:00 UTC)