DATA IMMORTALITY Part of my plan for my "Internet Client" computer was that it would help me organise my bookmarks between different machines. For various reasons, it hasn't. Well it has definately helped to an extent, but again I'm thinking that the only way I'll get bookmarks to work efficiently is using a separate bookmark manager program. I've been looking at bookmark managers without satisfaction for years though, so I've finally given in and decided to come up with a solution of my own. Feature summary: * Bookmarks stored as individual files in a directory structure equivalent to the bookmark menu structure - worryingly the only other bookmark system developers who seem to have gone with this approach were those of Microsoft Internet Explorer. That's probably a bad sign, but it still seems like the most flexible solution to me. Like MailDir, but for bookmarks. * Firefox-like add-bookmark dialogue, but run in a terminal window. Triggered by a keyboard conbination and automatically copies the current X selection of a URL. Downloads the page itself in order to grab the title. * Statically generated HTML interface which can be accessed either locally (file://) or from a local web server. directory tree + top-level bookmarks in either a small frame or table cell on the left, and directory contents in the main view. * In the frame view, have an option to browse all bookmarks in the small left frame, and open links in the larger frame, emulating Firefox's Ctrl-B bookmark selector. * List of all bookmarks on one page, usable with browser's page search function for searching. * Optionally save a local copy of the page being bookmarked using wget, also grabbing any file linked from that page up to a certain size limit. This goes into a separate directory tree, where I can also manually go into and grab the whole site using HTTrack if desired. The last feature is the one I really want to discuss, and it has been whirring around in my head for a long time ever since I read this post by Solderpunk: gopher://zaibatsu.circumlunar.space/0/%7esolderpunk/phlog/the-individual-archivist-and-ghosts-of-gophers-past.txt There he proposes a Gopher client (though I'd probably try to do it with a Gopher proxy myself) which archives every visited page locally. Just recently he's come up with a new approach to the problem, proposing instead that sites be hosted as Git repos: gopher://zaibatsu.circumlunar.space/0/%7esolderpunk/phlog/low-budget-p2p-content-distribution-with-git.txt Looking back on my earlier bookmarks, this is definately a problem that I do need to solve. I seem to have had a remarkable knack about a decade ago for finding websites that were about to go offline in the next ten years, and were obviously of so little interest to the world at large that the Wayback Machine often didn't bother archiving images (which are kind-of the key point if they're talking about electrical circuits) or much of the sites at all. Even when they did get archived, the Internet Archive is just another single point of failure anyway. Archive.is, for example, got blocked by the Australian government a few years ago for archiving terrorist content (the gov. did a rubbish job of it and you could still access the site via some of their alternative domains because it was done at the DNS level, but the fact that the people in power are idiots doesn't negate the potential of their power). Unfortunately I don't like either of Solderpunk's solutions. That may be a little harsh on Solderpunk. My objection to the client local mirroring approach is mainly just philosophical and the related practical problems are likely solvable. For his second suggestion, I disagree with using Git, but propose the same thing using Rsync (which also solves the URL problem at the cost of losing a pre-baked changelog system) and I'd be happy. The difference between us is simply whether to attribute importance to needless data storage. For me, storing data is a commitment. You don't need one copy of the data, the way I do things I need at least four. One copy on the PC you're working from, two on your local backup drive (the latest backup and your previous backup in case the backup process goes haywire, granted incremental backups are another approach which I don't use myself), and at least one copy off-site. I try to keep all the data I can't easily cope with losing on my laptop, with its 40GB HDD. Relying on a 20yo HDD probably isn't all that wise, but just to focus on the 40GB, that actually translates into up to 160GB of data stored, and 120GB needing to be processed to complete a full backup cycle. Maybe that's nothing these days, but to me it's already inconvenient: * It means the backup process takes a non-trivial amount of time during which the laptop's performance is poor, so I leave it to run overnight only once a week. That's a waste of power, and limits the regularity of my backup routine. * It means my only practical medium is HDDs. DVDs, CDs, ZIP disks, might be an option otherwise. I'm not managing to pick up SSDs or sufficiently large flash drives/cards in my free to $5 second-hand price range. * It means I can't use the internet in my laptop's backup strategy, because my connection is too slow and I'd have to pay a lot more than for my current 3GB/month deal. That combines with the first problem to make offsite backups more of a pain. (I've got my Internet Client computer set up on a 2GB SD card. All important files get synced with the laptop daily including all system/user configuration files which make up only ~30MB compressed) Now back to Solderpunk's concept. you can say that Gopher content (or probably Gemini, though I don't look at that much) is small so you might as well grab everything. But my Gopher hole currently totals 80MB. I've got about 70 sites bookmarked in the Gopher client on this PC (UMN Gopher); if I'm average (alright I'm probably not, but I'm the only one I can run "du" on) then at 5.6GB that's enough data to fill up over 1/8th of my 40GB laptop drive right there. Including my backups, that would be 22.4GB of data sitting somewhere, regularly read and copied at the expense of time and energy. Now of that data, the largest share (34MB) is my archive of Firetext logs. I should purge that again actually - I do keep it all myself, and it may have some use for historical purposes, but the average Gopher user surely doesn't give a stuff. With the caching client scheme, it's not a fair assumption that the hourly log you look at one day is going to be what you want to find later either. With the Git hosting scheme, someone who just wants to read this phlog post is obliged to pull all that Firetext data in even if they've got no interest in it anyway. In fact the Photos and History Snippets sections make up the bulk of the other data, and yet the only part that I've ever received feedback on is the phlog, so for all I know this one 700KB corner is the only bit of content that anyone actually wants to view, yet using Git they'd be storing 80MB of data in order to do so. Should I just ditch everything but the Phlog and just host that with Git (or Gopher, for that matter, it's potentially just clogging up the Aussies.space server, which is why I cull the firetext archive already)? For you, maybe. For me the favourite part, the part I'd be most thrilled to find in my own browsing, is the History Snippets section (19MB), even though I've been struggling to get around to adding new entries there (by the way, if someone does actually like viewing it, letting me know would certainly help my motivation). So if I drop that then I'm dropping my favourite content for the sake of popularity, now embodied in the sheer efficiency of data storage and transfer. At the same time I don't think the client caching approach is right, because everyone who drops into the History Snippets section, clicks a couple of links, decides it's just something some weirdo's needlessly put together, and leaves never to return, then ends up carrying around the gophermap and photos they viewed purposelessly for as long as they can keep all their data intact. Yet the person who drops in, looks at a few entries, bookmarks it for later when they have the time (what I'd probably do), then goes away - they find that when they return after it's gone offline, all they can view is the same stuff they saw before. With the Git proposal, Rsync is an alternative which would solve the problem of fetching unwanted data. You just pick the directory with the content you're interested in and Rsync only mirrors that bit. Server load may be a problem, though public Rsync sites do already exist for software downloads, so maybe it's practical. You could also just Rsync individual files for browsing around and maybe before committing them to parmanent storage. But with my bookmark system, if I ever get around to creating it, I've got my own equivalent to the client caching system, which works with existing protocols (well, I guess most easily just with the web). It specifically grabs what I think I might want to look at. Rather than enforcing some rigid system that theoretically grabs all the data I'll ever want to find again, I'd rather just make that decision myself. - The Free Thinker.