Title: Introduction to git-annex (Port Of The Week)
       Author: Solène
       Date: 12 May 2021
       Tags: git versioning openbsd
       Description: 
       
       # Introduction
       
       Now that git-annex is available as a package on OpenBSD I can use it
       again.  I've been relying on it a few years ago but it was really
       complicated for me to compile it and I gave up.  Since I really missed
       it, I'm now back to it and I think it's time to share about this
       wonderful piece of software.
       
       git-annex is meant to help you manage your data like you would manage
       books in a library, you have a database telling you where the books are
       and you can find them on the shelves, or at least you can know who
       borrowed the book.  We are working with digital files that can be
       copied here so the analogy doesn't fully work, but you could want to
       put your data in an external hard drive but not everything, and you may
       want to have some data on multiples devices for safety reasons,
       git-annex automates this.
       
       It works very well for files that are not changing much, I call them
       "static files", they are music, videos, pictures, documents.  You don't
       really want to use git-annex with files you edit everyday, it doesn't
       work well because the process can be a bit tedious.
       
       git-annex may not be easy to understand at first, I suggest you try
       locally to grasp its purpose.
       
 (HTM) git-annex official website
 (HTM) what git-annex is not
       
       # Cheat sheet
       
       Let's create a cheat sheet first.  Most git-annex commands have a
       dedicated man page, but can also provide a simpler help by using "git
       annex help somecommand".
       
       ## Create the repository
       
       The first step is to create a repository which is based on git, then we
       will tell git-annex to init it too.
       
       ```command line example
       mkdir ~/MyDataLibrary && cd ~/MyDataLibrary
       git init
       git annex init "my-computer"
       ```
       
       ## Add a file
       
       When you want to register a file in git annex, you need to use "git
       annex add" to add it and then "git commit" to make it permanent.  The
       files are not stored in the git repository, it will only contains
       metadata.
       
       ```command line example
       git annex add Something
       git commit -m "I added something"
       ```
       
       Example:
       
       ```command line example
       $ echo "hello there" > hello
       $ ls -l hello
       -rw-r--r--  1 solene  wheel  12 May 12 18:38 hello
       $ git annex add hello
       add hello
       ok
       (recording state in git...)
       $ ls -l hello
       lrwxr-xr-x  1 solene  wheel  180 May 12 18:38 hello -> .git/annex/objects/qj/g5/SHA256E-s12--aadc1955c030f723e9d89ed9d486b4eef5b0d1c6945be0dd6b7b340d42928ec9/SHA256E-s12--aadc1955c030f723e9d89ed9d486b4eef5b0d1c6945be0dd6b7b340d42928ec9
       $  git status hello
       On branch master
       Changes to be committed:
         (use "git restore --staged <file>..." to unstage)
               new file:   hello
       ```
       
       ## Make changes to a file
       
       If you want to make changes to a file, you first need to "unlock" it in
       git-annex, which mean the symbolic link is replaced by the file itself
       and is no longer in read-only.  Then, after your changes, you need to
       add it again to git-annex and commit your changes.
       
       ```command line example
       git annex unlock file
       vi file
       git annex add file
       git commit -m "I changed something" file
       ```
       
       ## Add a remote encrypted repository
       
       If you want to store data (for duplication) on a remote server using
       ssh you can use a remote of type "rsync" and encrypt the data in many
       fashions (GPG with hybrid is the best).  This will allow to store data
       on remote untrusted devices.
       
       ```command line example
       git annex initremote my-remote-server type=rsync rsyncurl=remote-server.com:/home/solene/git-annex-data keyid=my-gpg@address encryption=hybrid
       ```
       
       After this command, I can send files to my-remote-server.
       
 (HTM) git-annex website about encryption
 (HTM) git-annex website about special remotes
       
       
       ## Manage data from multiple computers (with ssh)
       
       **This is a way to have a central git repository for many computers,
       this is not the best way to store data on remote servers**.
       
       If you want to use a remote server through ssh, there are two ways:
       mounting the remote file system using sshfs or use a plain ssh.  If you
       use sshfs, then it falls as a standard local file system like an
       external usb drive, but if you go through ssh, it's different.
       
       You need to have a key authentication based for the remote ssh and you
       also need git-annex on the remote server.  It's important to have a
       bare git repo.
       
       ```command line example
       cd /home/data/
       git init --bare
       git annex init "remote-server"
       ```
       
       On your computer:
       
       ```command line example
       git remote add remote-server ssh://hostname:/home/data/
       git fetch remote-server
       ```
       
       You will be able to use commands related to repositories now!
       
       ## List files and where they are stored
       
       You can use the "git annex list" command to list where your files are
       physically stored.
       
       In the following example you can see which files are on my computer and
       which are available on my remote server called "network", "web" and
       "bittorrent" are special remotes.
       
       ```command line example
       here
       |network
       ||web
       |||bittorrent
       ||||
       X___ Documentation/Nim/Dominik Picheta - Nim in Action-Manning Publications (2017).pdf
       X___ Documentation/ada/Ada-Distilled-24-January-2011-Ada-2005-Version.pdf
       X___ Documentation/ada/courseada1.pdf
       X___ Documentation/ada/courseada2.pdf
       X___ Documentation/ada/courseada3.pdf
       X___ Documentation/scheme/artanis.pdf
       X___ Documentation/scheme/guix.pdf
       X___ Documentation/scheme/manual_guix.pdf
       X___ Documentation/skribilo/skribilo.pdf
       X___ Documentation/uck2ep1.pdf
       X___ Documentation/uck2ep2.pdf
       X___ Documentation/usingckermit3e.pdf
       XX__ Musique/Daft Punk/01 - Albums/1997 - Homework/01 - Daftendirekt.flac
       XX__ Musique/Daft Punk/01 - Albums/1997 - Homework/02 - Wdpk 83.7 fm.flac
       XX__ Musique/Daft Punk/01 - Albums/1997 - Homework/03 - Revolution 909.flac
       XX__ Musique/Daft Punk/01 - Albums/1997 - Homework/04 - Da Funk.flac
       XX__ Musique/Daft Punk/01 - Albums/1997 - Homework/05 - Phoenix.flac
       _X__ Musique/Alan Walker/Alan Walker - Different World/01 - Alan Walker - Intro.flac
       _X__ Musique/Alan Walker/Alan Walker - Different World/02 - Alan Walker, Sorana - Lost Control.flac
       _X__ Musique/Alan Walker/Alan Walker - Different World/03 - Alan Walker, Julie Bergan - I Don_t Wanna Go.flac
       
       ```
       
       ## List files locally available
       
       If you want to list the files for which you have the content available
       locally, you can use the "list" command from git-annex but only
       restrict to the group "here" representing your local repository.
       
       ```command line example
       git annex list --in here
       ```
       
       # Work with a remote repository
       
       ## Delete a repository
       
       Simply mark it as "dead".
       
       ```command line example
       git annex dead $repo_name
       ```
       
       ## Adding a remote repository GPG encrypted
       
       ```command line example
       git annex initremote $name type=rsync rsyncurl=remote-server:/home/solene/mydirectory keyid=your@email encryption=shared
       ```
       
       ## Copy files to a remote
       
       If you want to duplicate files between repositories to have multiples
       copies you can use "git annex copy".
       
       ```command line example
       git annex copy Music -t remote-server
       ```
       
       ## Move files to a remote
       
       If you want to move files from a repository to another (removing the
       content from origin) you can use "git annex move" which will copy to
       destination and remove from origin.
       
       ```command line example
       git annex move Music -t remote-server
       ```
       
       ## Get a file content
       
       If you don't have a file locally, you can fetch it from a remote to get
       the content.
       
       ```command line example
       git annex get Music/Queen
       ```
       
       ## Forget a file locally
       
       If you don't want to have the file locally because you don't have disk
       space or you simply don't want it, you can use the "drop" command. 
       Note that "drop" is safe because git-annex won't allow you to drop
       files that have only one copy (except if you use --force of course).
       
       ```command line example
       git annex drop Music/Queen
       ```
       
       Real life example: I have a very huge music library but my laptop SSD
       is too small, I get get some music I want and drop the files I don't
       want to listen for a while.
       
       ## Use mincopies to enforce multi repository data duplication
       
       The numcopies and mincopies variables can be used to tell git-annex you
       want exactly or at least "n" copies of the files, so it will be able to
       protect you from accidental deletions and also help uploading files to
       other repositories to match the requirements.
       
       ### Enable per directory recursively
       
       ```command line example
       echo "* annex.mincopies=2" > .gitattributes
       ```
       
       ### Only upload files not matching the num copies
       
       If you have multiples repositories and some files doesn't match the
       copies requirements, you can use the following commands to only push
       the files missing copies.
       
       ```command line example
       git annex copy --auto -t remote-server
       ```
       
       Real life example: I want my salaries PDF to be really safe, I can ask
       to have 2 copies of those and then run a sync to the remote server
       which will proceed to upload them if there is only one copy of the file
       yet.
       
       ## Verifying integrity and requirements
       
       There is the git-annex fsck command which will check the integrity of
       every file in the local repository and reports you if they are sane (or
       not), but it will also tell you which file doesn't meet the mincopies
       requirements.
       
       ```command line example
       git annex fsck
       ```
       
       # Reversibility
       
       If for some reasons you want to give up git-annex, you can easily get
       all your files back like a normal file system by using "git annex
       unlock ." on the top directory of your repository, every local files
       will be replaced by their physical copy instead of the symlink. 
       Reversibility is very important when you deal with your data because it
       means you are not stuck forever with a tool in case it's broken or if
       you want to switch to another process.
       
       # My workflow
       
       I have a ~/DATA/ directory in which I have sub directories
       {documents,documentation,pictures,videos,music,images}, documents are
       papers or legal papers, documentation are mostly PDF. Pictures are
       family pictures and images are wallpapers or stupid images I want to
       keep.
       
       I've set a mincopies to 2 for documents and pictures and my music is
       not on my computer but on a remote, I get the music files I want to
       listen when I'm on the local network with the computer having the
       files, I drop them locally when I'm bored.
       
       # Conclusion
       
       git-annex separates content from indexation, it can be used in many
       ways but it implies an archivist philosophy: redundancy, safety,
       immutability (sort of).  It is not meant for backup, you can backup
       your directory managed by git-annex, it will save the data you have
       locally, you will have to make backup of your other data as well.
       
       I love that tool, it's a very nice piece of software.  It's unique, I
       didn't find any other program to achieve this.
       
       
       ## More resources
       
 (HTM) git-annex official walkthrough
 (HTM) git-annex special remotes (S3, webdav, bittorrent etc..)
 (HTM) git-annex encryption