The individual archivist, and ghosts of Gophers past ---------------------------------------------------- Foreword: This post has been a *long* time coming. The ball was set rolling by kvothe's departure from the phlogosphere in late July. New ideas on the matter popped into my head more recently, prompting me to finish it at last. So, it's not exactly fresh with regard to its specific motivating example, but the issues are no less relevant. My standard hyper-verbosity disclaimer applies! The Zaibatsu has had, from very early days, a policy which allows sundogs to request that their account be removed and all their content immediately and permanently deleted. This is called "claiming your civil right", which is part of the Schismatrix theme. The Orientation Guide explains: > This promise is not a gimmick to tie into the Schismatrix theme. It > is a recognition that the ability to delete your accounts from > online services is an important part of self-ownership of your > digital identity. This is genuinely an important freedom and one > which many modern online services do not offer, or deliberately make > very difficult to access. I have always been, and still am, proud that the Zaibatsu offers this right so explicitly and unconditionally, and I have no plans to change it. I really think this an important thing. And yet, it always breaks my heart a little when somebody actually claims their right, and it's especially tough when a large amount of high-quality gopherspace content disappears with them. As several people phlogged about noticing, kvothe recently chose to leave gopherspace, taking with him his wonderful, long-running and Bongusta-aggregated phlog "The Dialtone", which he had migrated from SDF to the Zaibatsu. I loved having kvothe as part of our community, but of course fully respect his right to move on. As I deleted his home directory, I thought to myself "Man, I wish there was an archive.org equivalent for Gopherspace, so that this great phlog wasn't lost forever". A minute later I thought "Wait... that is *totally* inconsistent with the entire civil right philosophy!". Ever since, I've been trying to reconcile these conflicting feelings and figure out what I *actually* believe. Far from objecting to archive.org's activities on the web, I've come to think of it as a valuable public service. I suppose I tend to assume - and I have no data on how warranted this assumption is - that most of the webpages that I am grateful to find have been preserved by archive.org have disappeared from their original homes on the internet not through the deliberate will of the authors, but due to various unintentional processes of digital decay: commercial web hosts go out of business, people lose their access to webspace provided by an ISP or university, people lose interest in a website and stop paying to have it hosted without necessarily actively wanting it gone, or people die and nobody they leave behind hows how to keep the site alive, or perhaps even knows that the site exists! It seems clear to me that there is no harm in publicly archiving pages which disappear in this manner. Often the information preserved by doing so is of great practical value, or historical interest, or both. In the case of pages which *were* deliberately removed by their author, things seem to get murkier. How does one balance the right of the author to control the lifespan of their own work against the various "greater goods" which are served by having stuff stick around forever? It's worth noting that the possibility of "unpublishing" something is a relatively recent development. There has never been a way to unpublish books, songs or films after warehouses full of physical books, tapes, discs whatever have been manufactured. Because of this I suspect there is an unusual lack of existing experience or careful thought about the question. Now that we *can* unpublish things, is it wrong to take away people's option to do so? You might think that, having instituted the civil right policy at the Zaibatsu, I've taken a strong stance on this. Actually, my decision to put that policy in place was driven by my frustration at being unable to delete accounts on websites. Often times, that frustration is not borne from me wanting to unpublish public material (which even sites with no way to delete accounts will often let you do) but from wanting to get myself out of the site's database, so my email address, private messages, login times and IP addresses, browser fingerprints, etc. aren't sitting around waiting to be sold to or stolen by marketers, spammers or other ne'er-do-wells. I've never actually given much deep though to the question of the right to unpublish. It seems that with regards to the web, at least, this philosophical question has more or less been bulldozed by the sheer technical possibility of something like archive.org existing - in much the same way that a lot of questions surrounding the copyright of music were, for a lot of people, bulldozed by the possibility of P2P filesharing. At least within geek circles, archive.org is so well-known that it is widely understood and generally accepted that an unavoidable part of the act of publishing something online is that it may well be around forever. Whether we like this or not, we have to live with it because there is no way to prevent it - tools like robots.txt have never been, and can never be, more than a "gentleman's agreement". As long as there are computers with hard drives connected to the internet, stuff might stick around forever, and it's naive to pretend otherwise. Gopher is no exception here. The Zaibatsu's civil right policy is meaningful in practice only because there is no equivalent of archive.org for Gopher. But there is no such equivalent only because nobody has yet bothered to build one. One may come, one day, and if it does we'll be powerless to stop it. We might protest against its coming mightily - I suspect, based on the things I've seen people write about questions surrounding Gopher search engines, that such a service would be pretty unpopular - but the people bringing it would likely say to us "What? Why on Earth did you ever think this wouldn't happen? How do you think the internet works?", and to some extent it would be hard to argue against this. Just because something can be done doesn't mean it should be done, but in the case of the internet (perhaps technology more widely, too!) if something can be done it almost certainly eventually will and so it's nothing more than an exercise in denial to get deeply attached to its temporary absence. It's hard not to get attached, though, because I think many people will agree that the way Gopherspace functions right now feels really nice. Heck, there is, or was, a phlog over at SDF with a tagline of "Because Google probably doesn't index this", or something to that effect. People clearly feel the need for an online space where they can exist in the comfort of knowing that not everything they write is immediately publicly searchable and preserved forever. How can you not get attached to that? Right now, Gopherspace is small enough, and tightly-knit enough, and ideologically-driven enough, that a culture of rejecting this kind of thing - making it taboo, if you like - could probably keep archiving at bay for a while. The cultural preferences of Gopherspace inhabitants already seem to keep at bay a lot of things which are perfectly technically possible with the protocol, like serving a lot of HTML. Even if we don't actually want to try to actively fight back against the arrival of archiving or extensive indexing to Gopherspace, I do think it's good to consciously appreciate and savour it, for the time that we can. What if we *do* want to actively fight back? Well, as said there's ultimately little we can do because you just straight up can't prevent these things from being done. But as a kind of soft resistance, there might be value in adopting alternative solutions to the (real) problems that an archive.org for Gopher would solve. I think that unlike the web, we might *have* a viable alternative, which takes advantage of Gopher's extreme simplicity. Archiving a website has never been entirely straightforward. You can't just save a single HTML file to disk and expect it to work like the original. This may have worked in the very earliest days of the web, but it wouldn't have been long before you had to also parse that HTML file and look for included external resources, most likely images, and download those, too (and then possibly transform the downloaded HTML to change absolute URLs for external resources to relative URLs which will work from the disc). When CSS arrived, stylesheets became one more component you'd have to archive. Yes, carefully designed websites will function well enough with images and stylesheets missing, but that hasn't been true for the average website for a long time. Today, archiving a website feels like a Herculean technical challenge. External stylesheets, fonts and images are just the beginning - modern sites completely fail without dozens of externally hosted scripts, many of which may try to pull in any of the above kind of resource from external sources whose URLs are not even pre-determined before site is executed ("viewed" is far too simple a term for a modern website). It doesn't seem like it would be hard at all to build a site which was impossible in principle to meaningfully archive. Archive.org probably hates the modern web even more than us Gopher-dwelling retrogrouches! Notably, Gopher does *not* have this problem. Most items of Gopher content consist, entirely, of a single text file. Saved to disk, this single file, viewed offline 10 years later after the original server has vanished, is in every way equivalent to its original hosted version. We've got it better than the web, and its actually easy to underestimate just how much better off we've got it. Just how much better of are we? I would submit that on a computer with even vaguely modern specs, it would probably be possible to use a Gopher client which *automatically and immediately* archived every singe documented you visited, as you visited it, and maintained a searchable full text index of those archives, without this being unduly taxing on processor time or disk space. Imagine that! This is quite a super power, and it enables everybody who surfs Gopherspace to act as an "individual archivist", forever preserving the things we see for our own personal reference later. If I'd been using such a client, Kvothe's Dialtone phlog would still be available to me to re-read at my leisure after he claimed his civil right, whilst being unavailable to any new readers. This seems to strike quite a nice balance between the interests of content producers and consumers. It's a human-scale solution which goes a very long way toward obviating the need for anything like a public archive or search index of all of Gopherspace. Obviously it can't replace a search engine for solving the problem of finding resources you aren't already aware of, but I would say that the vast majority of the times I've wished for a full text Gopher search engine it's been because I wanted to rediscover something that I remember reading a few weeks ago but now can't recall where. Like many people, I enjoy greatly the fact that modern Gopherspace is small and intimate. It's a place by humans and for humans, where it's still very possible to disappear and be forgotten. That's very valuable! Search indexes and archiving services threaten this feeling, and a lot of Gopherites are opposed to them for this reason. At the same time, it's hard to deny that such "intrusions" into Gopherspace solve real problems and could be incredibly useful. Deep down I know that these things are probably inevitable, especially if Gopherspace continues to grow rapidly. When they come I'll try to accept them gracefully. But in the meantime I think that individual archiving offers a solution to the most pressing problems such services would solve, in a way which still retains the precious feeling of a Gopherspace where we are *not* watched over by machines of loving grace. Well, except for the NSA machines which presumably log all plaintext internet traffic.