Refactoring My (File) Hierarchy 2020-06-20 21:06 by zlg (also published on zlg.space) When I came back to mkbak[1] in order to give it better options for custom backup naming, automatic date tagging, preservation of permissions, etc., I noticed that I was writing complex and tricky rsync filter files. I ran with this for a while -- and still have them as I write this -- but the problem stewed in my mind. Something bothered me about the organization of the files under my $HOME. Eventually, I felt there was a way for me to shorten my rsync filter files *and* reduce the stress I felt about my files. I just needed to figure out what the core problem was. To answer that, we need to understand what hierarchy's purpose is, or what its use introduces to your data. Academics can give you a more elegant explanation, but I understand hierarchy to be the way we divide and group objects of a type. Each split in the tree should be a meaningful distinction between groups of data. This distinction then facilitates easy browsing or finding of files. Of course, this is in theory, and to a certain point of nesting. If you're drilling down 17 levels, maybe try tagging! :P So, to start with my backup file lists were getting complex. I added dot-folders, excluded cache folders and other transient files, made exceptions for in-git config files or in-mkbak, and mixed the origins of the files I was managing. That's when it hit me: I care about "I made this" or "someone else made this". That was the first meaningful division in my refactor. I treat files made by me as MUCH more important than those by others, though I still have local git copies of a lot of software I use, and distfiles for everything else. Suddenly I noticed there was a lot to sort through. I had books and RFCs and specs mixed in with my articles, notes, and Ledger journals. I had icons mixed in with Pixel Art instead of being at the top level of Artwork. Some directories, like my music, weren't clear "Am I allowed to have this file?"... so there was another good distinction. Next, I remembered that some music albums, games, or even software was bought by me, and due to the Copyright Act of 1976, one may keep a single archival backup of media, but only for personal use to protect your purchase. That meant I needed a place to put my dumped cartridges, Humble Bundle games, SoundCloud albums, retail albums, etc. By this point I had four "buckets": * Mine * Others * Disallowed * Purchased I couldn't think of another distinction that was big enough to belong at the top level, so I went with the four to start with. --- The Mine tree covers files created or maintained directly by me. Stuff like family albums, projects, artwork, prose, etc. The Others tree is for files made by others that are legal to have a personal copy of and/or are free to distribute in unmodified form. This includes code repositories, software binaries of freeware, etc. Naturally, Disallowed just shows me what I need to delete, or purchase a copy if I liked it. Good to have separate. It was surprisingly much smaller than I expected. Lastly, Purchased is an attempt to catalog and account for my purchased digital files, to prove their legitimacy. --- These initial groups already gave me insight into my file collection, but I thought about further classification or "features" I wanted in my hierarchy. I'm still shopping for a new OS; Where would I put VMs and their scripts? What about WINE-compatible installer binaries? I also wanted a "scratch area" that would store files that need to persist between boots[2], but aren't yet ready to be added to the hierarchy. This part is actually much simpler for me, because I already organized my files by their type. Well, generally anyway. This was an opportunity to improve it. It wasn't that bad to basically swap directories around, rename a few things, update a few configuration files... With each directory I had sorted, I felt like the files were reaching a meaningful location for me that I didn't need to think too hard about. That's when it hit me: "Didn't I hear about some way to map or merge directories, or manage symlinks?" I did, actually. GNU Stow [3] is exactly that: you manage a set of 'packages' that roughly match up to directory trees and then merge them into wherever you want, to make it appear as if the files from all the others are in one place. *Exactly what I needed.* As a bonus, you can setup a given directory as STOW_DIR (I used /usr/local/stow) and then symlink your packages from there, so you aren't needlessly copying files or moving dirs around. As a test of GNU Stow, I took music that was freely available and music I had purchased, which were in two different directories, symlinked to them in the STOW_DIR, merged their contents with GNU Stow, then hooked that directory up to real software to make sure it worked. Here's what it looked like: /usr/local/stow/ ├── music_others -> /home/zlg/others/music/ └── music_purchased -> /home/zlg/me/purchases/music/ Then, I just ran a single command: stow -d /usr/local/stow -t ~/music -S music_others music_purchased After that, I did an `ls ~/music` and... son of a bitch, it worked! I double-checked by opening ncmpcpp and updating the mpd database, which was pointed to look at ~/music. The database looked the same as it did before. Playing files worked as if the data was never split. That's what sold me on GNU Stow's ability to produce the merged directories I'll need to pull off more fun stuff with this hierarchy. The careful reader already knows there's a caveat coming: synchronization. How does one keep them synced? The first solution that came to me is a cronjob that runs stow with the `--restow` (-R) flag: */15 * * * * stow -d /usr/local/stow -t /home/zlg/music -R music_others music_purchased You would repeat this for each stow target you want to work with. Doing it every fifteen minutes can be a bit too often for your use case. I used a five minute interval during testing and it didn't seem to tax my system. --- With these changes and automation in place, I needed to go correct a bunch of my existing symlinks to suit the new hierarchy. It was mostly manual work, but with a few `find` pipelines it was trivial to find the broken links. if I were to repeat this, I would instead have a stow package dir containing symlinks to other files, but with the filenames that I need in $HOME for everything to hook up nicely. I'll need to test that case a little more, because the links would need to be relative to the target directory instead of the stow package directory. If I pull this off, I can name it 'integration' or 'homedir' or something, and my dotfile management will be a solved problem. --- I'd be remiss if I didn't share the structure that I ended up with! I did the vast majority of brainstorming in a neat piece of software called TreeSheets[3]. It was the first time I'd used this software, but I enjoyed it quite a bit and will definitely use it for my next big brainstorming session. Hierarchical spreadsheets are a wonderful idea! As such, I'll share a link to the original TreeSheets file[4], or you can view an exported image of it[5] here in Gopherspace. I'm quite pleased to have mostly solved my organizational stress problem. It's relieving to know things are filed away in a sane and predictable manner. I'm also happy to have another tool at my disposal that will save me time. Thanks for reading, -z REFERENCES [1]: (mkbak is) A bash script that's a thin wrapper around rsync. It takes a single rsync filter list (to allow both including and excluding) and backs files up to a given location, preserving their permissions and mtime, etc. Maybe I will release it some time. [2]: Due to my living situation, I no longer run the desktop all day, to save on electricity costs. Thus, I needed some spare "scratch" space. [3]: http://strlen.com/treesheets Multi-platform hierarchical spreadsheet software. [4]: https://files.zlg.space/s/74zo3MHxzan5ngt [5]: gopher://zlg.space/I/misc/pfh_ts.png