[HN Gopher] Fast file synchronization and network forwarding for... ___________________________________________________________________ Fast file synchronization and network forwarding for remote development Author : saikatsg Score : 73 points Date : 2022-10-16 18:01 UTC (4 hours ago) (HTM) web link (github.com) (TXT) w3m dump (github.com) | solarkraft wrote: | If only macOS supported mounting via SSHFS ... | parasti wrote: | Mutagen also has a Docker extension. Really easy to set up. I | installed it recently after searching for ways to speed up Docker | on an Apple M1. It did work in my case. | emrah wrote: | What is the benefit over rsync which is the perfect tool for this | at the moment? Maybe add an faq section to the readme for | questions like this? | xenoscopic wrote: | The primary benefits: | | - Mutagen performs bidirectional synchronization (though it can | also operate unidirectionally); rsync is unidirectional | | - Mutagen uses recursive filesystem watching to avoid full | filesystem rescans (whereas rsync always does a full filesystem | rescan). This allows Mutagen to provide a more "real time" | sync. | | - Mutagen has an active synchronization loop that doesn't | require manual invocation. | | - Mutagen has more idiomatic Windows support. | | - Mutagen doesn't require that it be pre-installed on both | endpoints. | | Both use differential transfers (i.e. the "rsync algorithm") | for transferring individual files. | | There are other differences, of course, as well as | similarities. Mutagen's design is tuned for development work, | rsync's design is tuned for replication. I still use rsync for | archival operations on a daily basis - it's great! | fpoling wrote: | In past I have used lsyncd to develop locally and synchronize the | changes to a remote host over ssh where the code base was | compiled. This worked nicely even over GPRS network connection | with a speed like 30 Kbit/s. As the link had high latency it was | important to use emacs shell for the remote connection. This way | I could type the command locally and send it to the remote host | when pressing enter. | cube2222 wrote: | We've been using Mutagen extensively for remote development with | an EC2 instance hosting a docker-compose with a couple of | services and live rebuild+reload, and it's been working | fantastic. | | It's also nice for automatically managing port forwards. | ta988 wrote: | I've been using mutagen for over 6 months now to sync over an M1 | Linux VM. The only thing I miss is an option that would say | "force everything from A" or "force everything from B" I've had | rare cases where there were conflicts that I only could resolve | by pausing mutagen and running rsync. But I appreciate that | mutagen warns you and just doesn't overwrite silently like | syncthing can do sometimes. | ithkuil wrote: | Mutagen allows to choose a replication mode: | https://mutagen.io/documentation/synchronization | | Do you want something different from the "one-way-replica" | mode? | ajvs wrote: | How does it compare to syncthing? | jedisct1 wrote: | Super useful tool! | | Plus, it's multi platform. I'm using it to synchronize | directories between hosts running macOS, OpenBSD and Linux. | Everything works fine. | | I haven't tried the Docker Desktop extension since I switched to | Colima (Docker Desktop is constantly broken on Apple Silicon). | Naac wrote: | I haven't found anything better than using Unison. Maybe the | linked README could compare prior art? | xenoscopic wrote: | Conceptually speaking, Mutagen and Unison are very similar (and | actually I mentioned Benjamin Pierce's work in another comment | here asking about the sync algorithm - fantastic stuff!). I | tend to avoid direct comparisons because they always come | across one-sided, but some cursory differences: | | - Mutagen tries to integrate recursive filesystem watching very | tightly into its synchronization loop to drive synchronization | and allow for near-instant filesystem rescans | | - Mutagen automatically copies an "agent" binary to remote | systems to support synchronization, so no remote install is | required | | - Mutagen uses Protocol Buffers for its data storage, so | synchronization sessions created with older versions continue | to work with newer versions | | - Mutagen written in Go, Unison in OCaml (which allows Mutagen | broader platform support "for free") | | - Mutagen tries to treat Windows as a first-class citizen | | - Mutagen uses race-free traversal (e.g. openat, fstatat, | unlinkat, etc.) to perform operations | | Obviously the internal implementations are different, but both | use differential (rsync-style) file transfers, both use the | same reconciliation concepts, etc. | | Mutagen has the advantage of Go, recursive filesystem watching, | and modern POSIX/Windows APIs that didn't exist when Unison was | originally written, though some of that functionality has been | brought into Unison. | | For a comparison with Syncthing (and to some extent Unison), | check out this comment[0]. | | [0]: https://news.ycombinator.com/item?id=30966448 | karamanolev wrote: | This sounds like my dream tool - I've always loved how quickly | and well local tools work and remote environments cut into that | good experience significantly. For me to be productive, I really | need an instant feedback loop where tools work fast and I can | immediately experience the result of some small piece of work. | | Has anyone tried this for a real-world project and can share | feedback? | grogenaut wrote: | I generally find systems that aren't setup to let you dev | locally and require a dev in prod or remote also don't let you | work in tiny tight feedback loops either. I generally focus | making it work everywhere the same instead of fast sync but | that's just me. Well and the systems I have control over. | ta988 wrote: | Yes it is excellent, syncing macos (Jetbrains tools and a few | other things) with a Linux VM . | cassianoleal wrote: | I find that VS Code's Remote-* extensions work well. I'm | currently writing a Terraform provider on a remote Linux box | using Remote-SSH and everything feels local. Compilation, etc | happens on the remote and if I were serving requests it's dead | easy to forward a port. | fpoling wrote: | Mutagen tries to be secure so in principle one can develop on | untrusted remote machine. VSCode remote always assumes that | the remote part is trusted. | cassianoleal wrote: | That sounds interesting but I can't find any mention to it | in the docs. In fact, it sounds like it's just copying | files over to the remote and running commands there. | | Are you able to provide a reference to how Mutagen secures | my code on an untrusted remote? | xenoscopic wrote: | The general philosophy with Mutagen is to (a) delegate | encryption to other tools and (b) use secure defaults | (especially for permissions). | | So, for example, Mutagen doesn't implement any | encryption, instead relying on transports like OpenSSH to | provide the underlying transport encryption. In the | Docker case, Mutagen does rely on the user securing the | Docker transport if using TCP, but works to make this | clear in the docs, and Mutagen is generally using the | Docker Unix Domain Socket transport anyway. When | communicating with itself, Mutagen also only uses secure | Unix Domain Sockets and Windows Named Pipes. | | When it comes to permissions, Mutagen doesn't do a | blanket transfer of file ownership and permissions. | Ownership defaults to the user under which the mutagen- | agent binary is operating and permissions default to | 0700/0600. The only permission bits that Mutagen | transfers are executability bits, and only to entities | with a corresponding read bit set. The idea is that | synchronizing files to a remote, multi-user system | shouldn't automatically expose your files to everyone on | that system. These settings can be tweaked, of course, | and in certain cases (specifically the Docker Desktop | extension), broader permissions are used by default to | emulate the behavior of the existing virtual filesystems | that Mutagen is replacing. | AnthonBerg wrote: | I'd like to know more about the theory behind the synchronisation | -- how the syncing is known to be safe and non-destructive. | xenoscopic wrote: | The synchronization uses a repeated three-way merge algorithm, | very similar to Git's merge when merging branches. It is | triggered by recursive filesystem watching, which is also used | to accelerate filesystem rescans. It maintains a virtual most- | recent-ancestor and uses the two synchronization endpoints as | the "branches" being merged. Much like Git has "-X ours" and | "-X theirs" options, Mutagen also has automated conflict | resolution[0] modes that can be specified. You can find the | reconciliation algorithm here[1] (and there are an exhaustive | set of test cases in the corresponding _test.go file). | | To avoid a large class of race conditions (at least to the | extent possible allowed by POSIX and Windows), Mutagen will use | `*at` style system calls for all filesystem traversal on POSIX | systems, with a similar strategy on Windows. | | Also, to avoid race conditions due to filesystem changes | between scan time and change-application time, Mutagen will | perform just-in-time checks that filesystem contents haven't | changed from what was fed into the reconciliation algorithm. | | [0]: https://mutagen.io/documentation/synchronization#modes | [1]: https://github.com/mutagen- | io/mutagen/blob/master/pkg/synchr... | xenoscopic wrote: | Also, while Mutagen's exact implementation is novel in a | number of ways, I would be remiss to not point out that huge | amount of academic work in this field was done by Benjamin | Pierce[0] and later implemented in Unison[1]. | | [0]: https://www.cis.upenn.edu/~bcpierce/papers/index.shtml#S | ynch... [1]: https://www.cis.upenn.edu/~bcpierce/unison/ | liketochill wrote: | I've been using unison for what feels like 14 years. Once | working it was great but it always took me a while to | figure out the exact command line options I wanted. | Beautiful tool. | AnthonBerg wrote: | Thank you so much for the great replies! | xani_ wrote: | How's that compared to sshfs (wth cache/kernel_cache enabled) ? | I've used it few times where I had need to dev like that and it | was generally just fine for just editing a file, where | performance tanked was doing a lot of file I/O at once (say | updating git repo) | xenoscopic wrote: | The benchmarks will likely be highly dependent on your use | case, but SSHFS-style virtual filesystems (specifically those | backed by FUSE) typically have significantly lower performance | than something like an APFS/ext4/NTFS filesystem that Mutagen | could target with synchronization. | | All of your readdir()/stat()/open()/read()-style calls will | suffer significantly on virtual filesystems, and unfortunately | these get hit a lot by things like IDEs (e.g. when indexing | code), compilers, and dynamic language runtimes (especially | PHP). | | No tool is at fault in this chain, of course, it's a hard | problem. Mutagen is able to offer better performance by being a | little less dynamic and creating "real" copies of all the files | on a more persistent filesystem. | ta988 wrote: | Advantage of mutagen is that it works on OSes that can't do | sshfs. It felt faster too especially with a lot of IOs like | node modules or other things that touch a lot of files. But I | never ran a benchmark , it is so much faster by at least a | factor 10 than whatever is in docker desktop when populating | node modules that I don't even need a benchmark. | xenoscopic wrote: | Mutagen author here -- happy to answer any questions about | Mutagen[0], its Docker Desktop extension[1], its Compose | integration[2], or anything else! | | [0]: https://mutagen.io/ [1]: | https://mutagen.io/documentation/docker-desktop-extension [2]: | https://mutagen.io/documentation/orchestration/compose | notemaker wrote: | Any user stories with *vim + mutagen for _large_ remote code | bases? Vs code remote is the only thing that has been fast enough | in my experience, but I would love to be able to use my local | neovim instance for remote development instead and this tool | looks promising. | xenoscopic wrote: | It should work fine. Many users use Mutagen on multi-GB | codebases. If we're talking something larger (say 10s of GBs or | TB-sized monorepos), then there are some tweaks you can do to | make life with Mutagen a little easier. Feel free to reach out | to jacob[-at-]mutagen.io if you have a specific use case, or | pop over to the Mutagen Community Slack Workspace[0] to chat. | | [0]: https://mutagen.io/slack | eddyg wrote: | This sounds useful. But one question that comes to mind right | away: | | Does Mutagen handle the case where "local tools" (running on a | completely different architecture than the remote) still need to | "know" about include/header/library/etc. files from the _remote_ | machine in order to provide working "intelligence" capabilities? | | It's one thing to efficiently sync "code", but it's another to | make local tools fully-aware of the remote system's header files, | libraries, etc. | xenoscopic wrote: | On the synchronization front, Mutagen's only goal is to | facilitate the synchronization of files (albeit with a focus on | development-related settings and low-latency for a "real time" | feel). It doesn't attempt to integrate with any higher-level | tooling (except in the cases of Docker Desktop and Compose, | which is facilitated via external projects). That sort of | tooling, language, and framework-specific integration is a bit | outside the project's target scope (and something that becomes | very domain-specific). | | Mutagen will, however, happily operate between different | operating systems and architectures, so things like working | with a remote amd64-based Docker engine from your local | arm64-based laptop are totally possible. | | Also, several external projects (such as DDEV[0] and Garden[1]) | do use Mutagen as a low-level component in their stack to | provide synchronization that does "know" a bit more about the | framework that you're using. | | [0]: https://ddev.com/ [1]: https://garden.io/ ___________________________________________________________________ (page generated 2022-10-16 23:00 UTC)