Title: Synchronization files software
       Author: Solène
       Date: 04 May 2021
       Tags: unix
       Description: 
       
       # Introduction
       
       In this article I will introduce you to various opensource file
       synchronization programs and their according workflows.  I may not know
       them all, obviously.
       
       I can't give a full explanation of each of them, but I will tell you
       enough so you can know if it could be of any interest to you.
       
       # Software
       
       There are many software out there, with pros and cons, to match our
       file synchronization requirements.
       
       ## rsync
       
       rsync is the leader for simple file replication, it can take care that
       the destination will exactly match the source data.  It's available
       mostly everywhere and using ssh as a transport it's also secure.
       
       rsync is really the reference for a one-way synchronization.
       
 (HTM) rsync website
       
       ## lsyncd
       
       lsyncd is meant to be used in an environment for near to realtime
       synchronization.  It will check for changes in the monitored
       directories and will replicate the changes on a remote system (using
       rsync by default).
       
 (HTM) lsyncd website
       
       ## unison
       
       unison is like rsync but can synchronize in both way, meaning you can
       keep two directories synchronized without having to think in which
       order you need to transfer.  Obviously, in case of conflict you will
       have to resolve and pick which file you want to keep.  This is a well
       established software that is very reliable.
       
 (HTM) unison website
       
       ## rclone
       
       rclone is like rsync but will support many backend instead of relying
       on ssh to connect to a remote source.  It's mostly used to transfer
       files from or to Cloud services by making a glue between core rclone
       and the service API.
       
       I covered rclone in a previous article if you want more information.
       
 (HTM) rclone website
       
       ## syncthing
       
       syncthing is a fantastic tool to keep directories synchronized between
       computers/phones.  It's a service you run, you define what directories
       you want to export, and on other syncthing instances you can add those
       exports and it will be kept synchronized together without tuning.  It
       uses a public tracker to find peers so you don't have to mess with NAT
       or redirections, and if you want full privacy you can use direct IPs. 
       Data are encrypted during transfers.
       
       It has the advantages of working in full automatic mode and can
       exchange in both ways in a same directory, with multiples instance on a
       same share, it can also keep previous copies of deleted / replaced
       files and support many other features.
       
 (HTM) syncthing website
       
       ## sparkleshare
       
       SparkleShare isn't well known but still does the job very efficiently. 
       It offers automatic synchronization of a directory with other peers
       based on a git directory, basically, if you add a file or make a
       change, it's committed and pushed to the remote repositories.  If
       someone make a change, you will receive it too.
       
       While it works very well, it's mostly suited for non binary data
       because of the git backend.  You can't really delete old data so the
       sparkleshare share will grow over time.
       
 (HTM) SparkleShare website
       
       ## nextcloud
       
       Nextcloud has a file synchronization capability, it's mostly used to
       upload your data to a remote server and be able to access it from
       remote, but also share a file or a directory in read only or read/write
       to other people.  It's really a huge toolbox that requires a 24/7
       server but provide many features for sharing files.  A not so well
       known feature is the ability to share a directory between Nextcloud
       instances.
       
       Nextcloud has its core in PHP for the www access but also phone or
       desktop applications.
       
       Nextcloud can encrypt stored data.
       
 (HTM) Nextcloud website
       
       ## seafile
       
       Seafile is a centralized server to store data, like netxtcloud.  It's
       more focused on file storage than nextcloud, but will provide solid
       features and also companions apps for phones and desktop.
       
 (HTM) seafile website
       
       ## git-annex
       
       I kept the best for the end. Git-annex is a special beast that would
       have deserved a full article for it but I never found how to approach
       it.
       
       git-annex is a command line tool to manage a library of data and will
       delegate actual transfer to the according protocol.
       
       WHAT DOES IT MEAN? Let's try an analogy.
       
       You are in a house, you have many things in your house: movies, music,
       books, papers.  If you want to keep track of where is stored something,
       you need an inventory, in which you will label where you stored this
       paper, this DVD, this book etc...  This is what git-annex is doing.
       
       git-annex will allow you to entirely manage data and spread it on
       different location (with redundancy possible) and let you access
       natively (or at least tell you where to get it).  A real life example
       would be to use an external hard drive to store big files like music or
       movies but use a remote server to backup important documents.  But you
       may want your documents to also be on the external hard drive, or even
       two hard drives, you can tell git-annex to manage that.
       
       git-annex can give you the current state of your library without having
       the files locally, it will replace the whole hierarchy with symlinks to
       the real files if they are on your computer, meaning you can get the
       files when you need them or simply work on that index to remove files
       and then tell git-annex to proceed to deletion if possible (or when it
       can, like when you get internet access or you connect that external
       hard drive).
       
       The draw back is that all the tracked files are symbolic links to a
       potentially non existing file and that you need a specific workflow of
       unlocking file in order to make changes, and then store it again.
       
       I've been using it for years for data that doesn't change much
       (administrative documents, music, pictures) but it's certainly not
       suitable for tracking logs or often modified files.
       
       The name contains "git" but git-annex only use gits to store the whole
       metadata, the data themselves are not in git.
       
 (HTM) git-annex website
       
       # Conclusion
       
       There are different strategies to synchronize files between computers,
       they can be one way, both way, allow other people to use them, manage
       at huge scale, realtime etc...
       
       From my experience, we all manage our files in very different ways so
       I'm glad we have that many ways to synchronize them.
       
       PS: don't forget to backup, it's not because you replicate your data
       that you don't need backup, sometimes it's easy to destroy all the data
       at once with a simple mistake.