Low budget P2P content distribution with git
--------------------------------------------

In recent months I've spent a lot less time than is typical thinking 
about anything to do with computers and the internet, but there is 
one train of thought I've been repeatedly pondering.  I had hoped to 
write up a bunch of less technical stuff first (don't worry, that's 
still coming - I'm kind of disappointed in myself that I've lapsed 
into writing a massive computery, internety post so soon after coming 
back to writing here.  Bad Solderpunk!  In penance I'm going not 
going to write any more for at least a month - stay tuned for 
cycling, environmentalism and manga, though), but it seems like this 
technical idea has become just a little topical just recently, so 
perhaps now is actually a good time to get this idea out there.  Let 
me be very clear from the outset that I'm just idea-sketching out 
loud here.  This isn't a new project, or anything, I'm not giving the 
system I'm about to describe a name or committing to fleshing out the 
details or anything like that.  That's not to say nothing will ever 
come of this, I just want to make it clear from the outset that these 
ideas are half-baked at best and I'm absolutely not committed to 
jumping head first into wherever this train of thought leads...

Protocols like Gemini and Gopher are an effective salve against many 
of the miseries inflicted by the modern web, but by no means do they 
solve *all* the web's problems.  All three systems share the same big 
picture architecture, namely that the default pattern of usage is 
that content lives in exactly one place, a server which is online 
24/7, 365 days a year and accessible from anywhere on Earth, and that 
to consume this content you request a copy of it at the instant of 
consumption, render it to the screen and then discard it (perhaps 
after a relatively brief cache lifetime), leaving no persistent copy, 
with the understanding that if you want to read something again next 
week or month or year you'll just request a fresh copy and do all 
this again.  Because all three protocols work this way, all three of 
them share a long list of common shortcomings, mostly about losing 
access to stuff you'd like to still have access to.  Online content 
can become inaccessible to *you* in the short term if your internet 
connection goes down.  It can inaccessible to *anybody* in the short 
term if the server goes down.  It can become inaccessible to large 
groups of people in the *long term* due to the ease with which 
authoritarian governments can block access to a single server.  It 
can become inaccessible to *everybody* *forever* if the hosting 
service disappears (think Geocities), or if the person running a 
private server dies or is incapacitated and none of their friends or 
family know which bills to pay to keep the thing up.  These problems 
can be mitigated to some extent via load sharing, content delivery 
networks, caching proxies, etc.  All these solutions involve setting 
up yet *more* computers which are switched on and connected to the 
net 24/7, which is expensive both financially and environmentally.  
On a long enough timeline, the survival rate for all websites drops 
to zero: find some mailing list archives from the late 90s or early 
00s and try visiting all the URLs people shared in it.  More than 90% 
of them won't work.  20 or 25 years is not an awfully long time span 
for this kind of decay to happen in.

None of these observations are new or exciting, and there are no 
shortage of projects attempting to address various of these 
shortcomings in various domains.  You've maybe heard of DAT[1] and
IPFS[2] and SSB[3], and those are just the Johnny-come-latelies to
this sphere.  Freenet[4] has been around for over 20 years, and I
don't doubt that it has predecessors of its own.  What all of these
projects have in common is conceptual complexity.  They're
distributed, decentralised, peer-to-peer, content-addressed,
cryptographically authenticated, and more.  This isn't intended as
a criticism.  These projects have a much higher ratio of essential
complexity to "empty complexity" than something like a modern web
browser, because they're trying to solve substantially more
difficult problems, making some conceptual complexity is
unavoidable.  But all of the projects above and their associated
ideas have met with fairly limited implementation by developers and
fairly limited uptake by users, and I think the high barrier to
entry represented by a lot of conceptual complexity, even if it is
essential, is probably a large part of the reason for this (that
and a healthy serving of apathy, no doubt).  I'm not trying to 
say that the search for clever solutions to these problems is futile,
not at all.  I'm just laying out what seem to me to be the facts.

Completely solving the problems associated with an always-online, 
purely client-server web is never going to be easy.  The wait for 
something which works well enough and is user friendly enough to 
facilitate serious uptake is going to be a long (though hopefully 
worthwhile) one.  In the meantime, it's tempting to wonder whether or 
not there is some kind of "80:20" solution to these problems which 
gets at least some of us at least some of the way there - enough of 
the way to be worthwhile - without a huge learning curve.  Lately 
I've been thinking that maybe there is, and that maybe it's actually 
not even all that hard.  In fact it's so incredible simple that I'm 
almost embarrassing to say it out loud, out of fear that if it were 
*that* simple then people would *obviously* already be doing, so 
clearly I've missed something big due to not being smart enough.  Or 
maybe some people well of my radar *are* doing this, and that's what 
I've missed.  Anyway, are you ready for this huge idea?  Here it is.

Use git.

No, really, just use git.  Not the way you're possibly using it 
already (like I am), as a kind of deployment mechanism, where you 
write your posts locally, commit them to a repo, then push to a 
remote copy of that repo only you have access t only you have access 
to, triggering a hook which checks out a copy of your work in 
whatever directory your web/Gopher/Gemini server looks in (although, 
if you're doing that, switching to using it the way I'm talking about 
is a piece of cake).  I'm talking about using git for small internet 
content the way people use it for source code, as an actual 
distribution mechanism for ending up with a local copy of something 
on our disk that you then use offline (by compiling it, interpreting 
it, etc).  I'm talking about your text-centric online content being 
nothing more than a public git repository.  If somebody wants to read 
your posts, they clone your repo.  Then they've got your posts on 
their disk, and they can read them from there. If they go offline, it 
doesn't matter, because your stuff is on their disk.  They can read 
it today, and tomorrow, and next year.  If your server goes offline, 
it doesn't matter, because your stuff is on their disk.  If they like 
your stuff and want to read more of it, then next time both they and 
you are online, they are one `git pull` away from getting any updates 
you've made since their original clone.  There is no need for Atom, 
or RSS, or carefully formatted index pages with datestamps integrated 
into link text.  When distributing by git, visiting a site and 
subscribing to a site are one and the same thing.  No extra 
technological concessions to the notion of "subscribability" are 
needed.  Furthermore, when distributing by git, visiting a site and 
making a complete offline archive of the site are one and the same 
thing.  There's no need for slow, clumsy, error-prone and 
admin-irritating loops of repeatedly fetching and parsing files using 
tools like wget to discover the URL of every single resource in a 
site.  You just grab the whole thing at once in a single network 
transaction, no parsing required.  Git is actually better than 
Atom/RSS and recursive wget combined!  An Atom or RSS feed usually 
only has the 10 or 20 most recent updates in it, so if you're offline 
for a long time you'll miss some stuff.  Git won't, you'll get every 
commit made since your last pull.  And a recursive wget just leaves 
you with an offline copy of an entire site as it was in one point in 
time.  There's no way get *just* the new stuff one month later - 
sure, with HTTP(S) you can use headers like If-Modified-Since to 
avoid fetching new copies of stuff that's changed, but you still need 
to make a request for every single page which *could* have changed.  
With git you just pull and that's it.

I've barely scratched the surface here.  I'm going to keep going, but 
first let's really quickly think about this from a network privacy 
point of view.  Cloning or pulling a git repo involves making network 
connections to *one* server, known in advance, and has no side 
effects.  There are no cookies or anything cookie-esque to tie 
subsequent requests together at a more fine-grained or persistent 
level than the IP address.  This is much better than the web, and 
exactly on par with Gemini and Gopher.  If you want to, you can do 
git stuff over HTTPS or SSH, and that's normal and standard, so in 
this respect we're better than Gopher where plaintext is the only 
option.  But if you don't want to use crypto, or your computer can't 
handle it, or you're using some futuristic internet overlay like 
Yggdrasil so you get transport security without baking it into every 
protocol, you can do a plaintext git:// clone.  So for some folks 
this is better than Gemini, where it's TLS or bust.  But the 
git-as-distribution-tool approach gives you something that none of 
the web or Gopher or Gemini give you: it's one network transaction 
for the *whole site*, and that's it.  A git admin knows that you (or 
rather, your IP address) has cloned their repo and now has all their 
posts.  But that's it.  They don't know which posts you read, and 
which ones you don't.  They don't know which posts you read once and 
which ones you read every day and which ones you only read in the 
middle of cold, lonely nights.  There is nothing like a "click 
stream" for them to analyse.  Even the boogeyman of "traffic 
analysis", where the size and latency of opaque encrypted 
transactions are used by third parties to reconstruct your path 
through a public site gain no traction here.  Your fine-grained 
consumption habits are entirely invisible to everybody but you.  
That's really neat!

One more brief digression: I've described everything so far in 
network terms (and will get back to that shortly and then do it for 
the rest of this post).  But keep in mind, please, that there is 
*nothing* network-centric about this idea.  We're all very used to 
doing git clones and pulls over TCP/IP, but you can clone and pull 
from the filesystem just fine.  Try it.  Git won't bat an eyelid.  
That means you can clone and pull from USB sticks and SD cards, which 
means this whole thing works just fine over sneakernet.  You don't 
have to go "all in" on sneakernet, you can mix and match it with 
networking in whatever proportion suits you, and transition slowly 
from using mainly one to mainly the other on an as-needed basis.  I 
think about sneakernet a lot these days, and I think anybody else 
who's interested in sustainable/perma-/salvage computing ought to as 
well.  I'll write more about this some other time.  Let me just say 
for now that the fact that this git-for-distribution thing works 
seamlessly via sneakernet is a big plus for me.

Okay, back to the main thrust: by visiting/archiving/subscribing to a 
site via git we get even more than Atom/RSS and recursive wget 
combined can offer, with less effort on the part of either producer 
or consumer.  Jake.  But so far we've still talking about readers 
fetching content from a single authoritative source operated by 
authors, so we still have a lot the usual centralisation problems.  
This approach still puts a potentially heavy load on one 
authoritative server, it still requires lots of long distance data 
traffic, and if the author's server disappears forever *before* you 
got a chance to clone the repo, you're out of luck.  Getting past 
these hurdles in a web/Gopher/Gemini context isn't easy.  If I use 
recursive wget to get a complete local copy of some website, then in 
order to enable somebody else to use a recursive wget to get a 
complete copy from *me* (because my server is closer, or more 
reliable, or the original is gone) there's a lot more rigmarole 
involved.  I'd need to setup a webserver and point it at my copy, and 
there's no guarantee that alone is enough.  The site may not work 
properly without suitable URL rewriting or redirecting rules or 
similar configuration details in place on the server side.  I'd need 
to reproduce those settings exactly, and the information required to 
do so is *not* something I'd end up with as a consequence of doing 
the original recursive wget.  So the whole procedure kind of only 
works once, and can't reliably be chained, with an n-th party getting 
a fully functioning copy from a (n-1)-th party's copy.  Even if 
redirects/rewrites weren't in the picture and this chaining *was* 
possible, there'd naturally be a big question of trust, as at any 
stage along the chain the site could be modified by somebody other 
than the original author and you'd be none the wiser.  But none of 
these problems are there in the git version!  You can clone a clone 
of a clone no worries, that's normal.  Everybody who "visits a site" 
distributed by git has everything they need to *redistribute* the 
site.  And git has built-in support for signing commits with GPG, 
which can go a long way toward resolving the trust problem (public 
keys can be distributed as part of the repository itself, which works 
out alright as long as you can be confident you make your initial 
clone from the genuine origin - not foolproof, but much better than 
nothing).  All of this is just bog-standard git functionality, tried 
and tested, nothing new or exciting, 100% ready to go and documented 
in countless sources.  This stuff is exactly what makes git a 
*distributed* version control system.  The new idea here is really 
nothing more than using it to distribute writing to readers, instead 
of source code to developers or users.

It turns out we've *had* a decentralised, distributed, offline-first 
system for P2P storage and delivery of text files for 17 years now!  
It was just created for an application very different from 
blogging/phlogging/gemlogging.  By the time git became an established 
and familiar technology, the web was in the full blown grip of "web 
2.0" fever, and static, non-interactive content that was 90% text was 
consigned squarely to "the past".  This resulted, I think, in a 
missed connection, which maybe we can finally make.  There's nothing 
fundamentally wrong with interactivity, of course, nor with non-text 
media, either.  But I don't need to tell anybody who is reading this 
via Gopher or Gemini that there's a whole universe of material which 
is interesting, or informative, or useful, or amusing, or uplifting, 
or otherwise valuable even if it's "just text" and even if you read 
it days or weeks or months or years after it was originally written.  
That's not a unique property of source code.  It's true of our little 
small internet world, too!  Git is just perfect for distributing 
exactly this kind of writing.  You get delay-tolerant subscription 
for free: Atom and RSS can go to the dustbin of history.  Constant 
internet connectivity is not required, although it doesn't hurt.  You 
can pull from all your repos four times a day every day if you live 
all the time in an apartment with a permanent high-speed internet 
connection.  If you're trying to spend less time online because you 
think that's better for you in some way(s), you can connect once in 
the morning, pull from all your repos and then disconnect and read 
what you received at your leisure.  If you live on a boat and 
sometimes go without internet access for weeks at a time, that works 
just fine too.  If you are travelling without regular internet access 
and you meet somebody on the way who follows some of the same repos 
you do, whichever one of you pulled from upstream less recently can 
pull from one who did so more recently to get some updates on the 
road - and then pull later from the official source once back in 
civilisation, without this switching of sources causing any problems. 
 Stuff can continue to circulate for years after the original source 
disappears, provided enough people were interested enough in the 
first place to clone it and make their clones readable.  To be 
honest, this feels to me like it could be an even better small 
internet platform than Gopher or Gemini, at least for some kinds of 
content (for others, perhaps not - I'll return to this later).

Of course, this is nothing like a *real* solution to any of the nasty 
problems of centralised client-server distribution.  You can update 
your clone of a git repo from some source other than the original, 
official repo, and have confidence that what you get is genuine 
thanks to PGP, sure - if you know about that other source in advance. 
 But there's no magic means by which knowing only the URL of the 
original repo you can automatically find the most up-to-date third 
party copy or copies which are online now and close by to you in 
network terms and pull from them instead.  That's the kind of hard 
problem which makes real P2P systems complicated, and git does 
nothing at all to solve these.  But we can 80:20 around this to some 
extent.

I've been vague up until now about exactly how this works in a hands 
on, daily use kind of way.  I'm not proposing we literally spend our 
time doing git clones and git pulls manually by hand all the time 
(although you *could* use this system that way, and that should be 
seen as a feature, just like being able to access Gopherspace via 
telnet).  We can build tools to streamline things.  This is largely 
the reason, incidentally, for using git in particular and not 
Mercurial or Fossil or whatever else might be hot these days.  Git is 
ubiquitous and isn't likely to stop being so anytime soon.  It's been 
ported everywhere - you can use git today on Plan 9 or Minix 3 or 
whatever weird system floats your boat (are there still open source 
descendants of Solaris out there?  If there are, I bet they have 
git).  There are bindings to libgit in all major programming 
languages, allowing you to automate this stuff.  All this work has 
already been done, and these tools are going to be kept up to date 
and ported further and documented better by people who don't know and 
don't care about the small group of dorks using git as a plain text 
content distribution system.  It's exactly the same philosophy behind 
using TLS for Gemini and not something newer and better.  Tiny 
guerilla computing projects can't afford to ignore the opportunity to 
have the enemy manufacture our weapons for us.  So we build tools 
based on git, because a lot of us already know how to build them, and 
once they're built they'll be usable just about everywhere.  We can 
throw together something which has the look and feel of a traditional 
Atom/RSS-based feed reader, but it's powered by git under the hood, 
it just looks at timestamped commits to figure out which files were 
updated when.  And there's no reason we can't standardise on every 
repo designed to be used in this way having (or *optionally* having) 
a directory in the rep root with a well-known name which contains 
simple .ini or .json or .yaml or whatever files (no doubt getting 
everybody to agree on one of these would represent 99% of the work of 
actually bringing this idea to fruition) that provide a little bit of 
metadata in an easy-to-parse format.  These could provide some of the 
feed metadata that you'd traditionally find in Atom/RSS, like a 
repository's title, subtitle, author, contact details and license 
information.  They could provide GPG public keys.  And they could be 
used to advertise the URLs of clones of the repo, its "official 
mirrors", and maybe where these clones are in the world and at what 
times of day they are mostly likely to be online (ditto for the 
original).  The git-aware app could register all those URLs as 
additional "remotes" for the repo, and it could preferentially try to 
pull from the nearest one most most likely to be up when the user 
hits "refresh", and if that remote was down, it could fall back to 
the second best choice, and so on.  This involves some manual 
coordination between authors and willing mirroring parties, and 
introduces a kind of dichotomy between "official mirrors" and 
"unofficial mirrors" which you'd need to learn about out of band and 
tell your client about, but I suspect we can tackle this in the usual 
grass-roots, small internet way and still end up somewhere better 
than we are right now.  It's far from perfect, but it's also far from 
awful.

And we're *still* really just scratching the surface of what doing 
this would enable.  To make it explicit, we're talking about a system 
where every participant keeps a full copy of the full history of 
every site they visit on their hard drive indefinitely.  This sounds 
nuts at first.  It also sounds nuts that in this system there is no 
way to fetch just a single post - if you want to read one post that 
somebody has told you about, you have to clone the full repo 
containing said post.  That's, in some sense, woefully inefficient!  
These concerns diminish rapidly if we start thinking small.  I've 
been phlogging on Gopher for over four years now.  Anybody who has 
been following me all that time knows that I am *not* a succinct 
writer.  I am relentlessly verbose.  And yet, my phlog directory is 
1.7 megabytes.  Having to clone that whole lot to read one post 
doesn't seem so horrible knowing that.  When visiting a single blog 
post on the web today you could easily pull down a lot more than 1.7 
MB of external fonts, style sheets, surveillance Javascript, flashy 
background images and more.  Cloning my whole phlog repo to read one 
post is less efficient than using Gopher to fetch just that one post, 
but it's still more efficient than the status quo of the web.  Let's 
suppose that I continue to phlog at the same exhausting level of 
verbosity for fifty whole years in total.  That would bring me up to 
just over 21 MB, which we can round up to 25 MB to make things 
simpler.  Now, suppose you didn't want to just read *my* fifty years 
of rambling, but you wanted to read the ramblings of *one hundred 
people* who all wrote excessively for fifty years - arguably more 
output than any person really has the time to read.  This would bring 
us up to 2.5 GB.  That fits several times over on the smallest USB or 
SD storage device you can buy.  Businesses literally give that much 
storage away for free in the form of promotional key chains.  The 
above calculations could be off by a factor of ten (git itself 
obviously introduces some degree of storage overhead which I've 
completely failed to address so far and, in truth, know almost 
nothing about, but I'm pretty sure it's nothing like a factor of ten) 
and the storage burden of 25 GB would still be underwhelming, even 
for a 20 year old machine.  We really can live this way.  Text is 
*small*.

Having full local copies of everything ever written by anybody whom 
you've ever read a single small internet post by is a game changer in 
and of itself.  Stuff like archive.org becomes at least partially 
obsolete, because you have the full history of each site locally.  
You can, to some extent, be your own search engine.  Obviously you 
can't search your own disk to find stuff you've never previously 
fetched, but you can easily find stuff you vaguely recall reading a 
year ago, and if you've only just recently started following somebody 
who has been writing for years, you can search their back catalogue.  
You can ask your computer to find other posts you have on your disk 
which are "similar to" some particular post, in terms of them both 
using similar words or phrases which are otherwise rare.  All sorts 
of machine learning, pattern recognition, recommendation engine type 
stuff could be done, if you wanted, but it's something you could do 
yourself entirely on your own machine with complete control and 
transparency and perfect privacy.  If one of those metadata files in 
a well-known location in every repo mentioned earlier was a kind of 
machine-readable "git-roll" where authors could advertise the URLs of 
other repos that they are reading, then you could even do a little 
casual repo spidering (with a configurable maximum amount of disk 
space and monthly bandwidth dedicated to this - possibly both set to 
zero if you don't care for it).  This all sounds somewhat futuristic, 
but indexing and searching and identifying fuzzy conceptual 
connections between a couple of gigabytes worth of text files is not 
exactly the computational cutting edge.  I'm starting to feel like in 
some ways we have been denying ourselves super powers for years 
simply by continuing to distribute our content in a fashion which 
makes it really impractical to grab sites wholesale, even though the 
bandwidth and disk space required to do this (for simple text files, 
anyway) has long been easy to come by.

I've been unrelentingly positive about this whole prospect so far.  
So many benefits to content distribution via git!  Aren't there any 
problems?  Well, sure.  There are two big ones that I've identified 
so far.  One is technical, the other is, uhh, sociological?  Or 
something?  Let's deal with that one first.  The basic issue is that 
stuff on the internet can become unavailable in two different ways.  
Sometimes stuff disappears involuntarily - due to technical faults, 
censorship, business failures, financial problems, etc.  But 
sometimes stuff disappears because the author didn't want it up 
anymore and willingly took it down, which feels like a reasonable 
thing for authors to be able to do if they like.  We might, very 
roughly, think of these as "bad disappearances" and "good 
disappearances", respectively.  The problem is that it's not possible 
to solve the bad disappearance problems without making good 
disappearances impossible.  Publishing something via this git system 
is in principle permanent and irreversible.  If just one person 
clones or pulls from your repo before you take it down, other people 
can pull/clone from them and there's nothing you can do to stop this 
beyond asking nicely.  It's not just "taking stuff down" that becomes 
infeasible.  If you change your mind about something you wrote ten 
years ago and want to change it, you can do so - but everybody 
"subscribed" to your repository will be notified of this fact and 
will be able to see both the before and after versions.  This kind of 
publishing is, by necessarily, radically long-lasting and radically 
transparent in a way that people aren't used to and many may not be 
ready for.

Many will say that the internet is *already* like this and you can 
never guarantee that anything you publish, via any protocol, won't be 
redistributed forever.  This is exactly right.  It's the very nature 
of a global network of general purpose computing devices, and we 
should never fool ourselves into thinking that any technology can 
prevent this.  Furthermore, this isn't a problem unique to using git 
for publication, it's going to be a problem in *any* solution to 
these problems.  Does that mean we should just forget about this 
issue?  Maybe not.  Just because something is always possible in 
principle doesn't mean that making it as quick and easy and 
convenient as possible will be without consequence.  An internet 
which never forgets is handy in a lot of ways and in a lot of fields 
of endeavour.  It's also strongly mismatched with human social 
psychology and norms.  The small internet crowd tends to place a lot 
of emphasis on "human scale" computing and on personal connections, 
so I think this is worth flagging this and encouraging people to 
think about it.  But I do also think it's possible to overstate how 
big of a deal this is.  Maybe I've already done that.  I dunno.

The other big problem, the technical one, is that of linking.  That 
whole hypertext thing.  Let's consider a "gitlog", i.e. a 
blog/phlog/gemlog-style resource which is published exclusively via a 
public git repository, and is not hosted on any of the traditional 
server-client request-per-page protocols ("gitlog" is a horrible name 
for this thing because it will cause massive confusion and search 
engine collision with the `git log` command, but I'll use it as a 
placeholder for now).  Internal links within one gitlog are 
straightforward (at least if it's in HTML or gemtext, both of which 
support relative URLs), but how does the author of a post in this log 
provide a link to an individual post in another gitlog?  An 
unambiguous pointer to an individual gitlog post necessarily has two 
parts: the URL of (any clone of) the repository, and a path relative 
to the repository root indicating the file containing the post in 
question.  I am not aware of any pre-existing URL scheme for 
unambiguously conveying both these things at once, nor of any 
pre-existing hypertext format which allows "two part" links.  It's 
not remotely hard to imagine how to cook up either one, perfectly 
straightforward in fact, but ugh, once we do that this stops being a 
super minimal "just use this existing thing to distribute your 
arbitrary existing text, with maybe a tiny bit of optional helper 
metadata sprinkled in if you want" approach and becomes a whole 
*thing* with its own unique format which you have to buy into.  I 
really like that a lot of people are basically already 100% geared up 
to distribute their smol content this way by just making the private 
repository they already use for deployment publically readable, super 
quick and easy, no other change required.  Anything which stands in 
the way of that feels like a bad idea.  But without a standalone 
pure-gitlog linking solution, the whole system is limited to 
bihosting scenarios, where git-based distribution kind of lurks 
behind the scenes, and plain old gopher:// or gemini:// links what we 
actually include in our posts.  This is not great, but perhaps 
something we can live with?  Maybe there's a convention where if your 
Gopher or Gemini content is also available in this way, you configure 
your Gopher/Gemini server to respond to requests for a certain 
well-known endpoint with (i) a git repo URL and (ii) a regex or 
some-such for transforming your gopher:// or gemini:// URLs into 
paths relative to your repository root?  That would work, I think, as 
a kind of easy machine-readable "gateway" from the Gopher/Gemini view 
of things to the git view of things.  Maybe there are even better 
ways?  I don't mean to suggest we can't somehow make this linking 
thing at least roughly work, I'm just highlighting that this is the 
most substantial issue I've thought of without an obvious and easy 
solution.  I still think there's something worth pursuing inside all 
this.  Hell, even if we give up on external linking, that's not the 
end of the world.  The short, whimsical urban fantasy of Joneworlds[5] 
is a genuine gem of modern Gopher/Geminispace, and the vast majority 
of it is entirely self-contained and very little would be lost by 
distributing it without any links at all.

I think at last that this is all I have to say about this for now.  I 
mean, there's more I could say, all sorts of little details, but I 
think this is enough for now, to get the idea out there.  I am happy 
to release this idea into the electronic wild and see what kind of 
life, if any, it may take up in the minds of the denizens of the 
small internet.  I look forward to hearing people's thoughts.

I've written all this up over the past week or so (took longer than I 
thought it would!), but the core of the idea has been brewing for a 
few months, and I've been influenced in various ways by some stuff 
that I wrote and also some stuff I read over the past year or two.  
I'm going to try to dump links to all of these influential things 
below, but this probably won't be exhaustive, sorry.

I got started thinking about using git for content distribution 
during the formation of Circumlunar Space's newish zine project, 
Circumlunar Transmissions[6].  The zines are hosted via Gemini and 
Gopher, but you can also clone a git repo, the idea being that people 
can use this to easily host Gemini or Gopher mirrors.  At some point 
I started wondering about the possibility of just skipping that last 
part and distributing it entirely via git.  A zine is a *perfect* use 
case for this.  No sane person expects a zine to be editable or 
deletable after release.

I was motivated to write this stuff up *now*, not in another week or 
a month, by a recent post by ploum[7] which asked "Could we imagine a 
decentralised and delay-tolerant network simple enough so you could 
implement it in a day?", and envisaged a system where folders of 
PGP-signed Markdown documents are copied to the local disk and 
browsed from there.  There are some extra ideas in there about using 
the system for something like email, too.  If you ignore that part 
and just focus on the Markdown distribution, well, I think git 
basically already does this.

Not long at all after I read that post and decided I had better start 
writing, I came across a short article[8], inspired by the off-grid 
working habits of the 100 rabbits crew[9] (who produce code and art 
while living on a 10 meter yacht, occasionally taking breaks to make 
awe-inspiring and death-defying ocean crossings), which asserted that 
"saving pages with wget is like low-budget p2p" (inspiring the title 
of this post), and asked a bunch of provocative questions:

* What if the browser was local-first?
* What if websites showed up as files and folders on my computer?
* What if the browser saved a copy of everything I bookmarked?
* What if I had my own personal wayback machine?
* What if I had a little local Google that could search the full text 
of everything I’ve ever saved?
* What if I could copy those website files and remix them? Add links. 
Mark them up with highlights Write margin notes.
* What if the whole web was built around copying/remixing/sharing?

Using git for distribution straightforwardly opens up all of these 
possibilities.

In writing this up, I tapped into some older ideas.  The first is one 
of my own: I made a phlog post about 2 years ago wherein I claimed 
that "on a computer with even vaguely modern specs, it would probably 
be possible to use a Gopher client which *automatically and 
immediately* archived every singe documented you visited, as you 
visited it, and maintained a searchable full text index of those 
archives, without this being unduly taxing on processor time or disk 
space"[10].  I discussed this in the context of the value of being able 
to easily disappear, which is something we lose with git distribution.

I take the question of when, if ever, radically permanent online 
writing make sense as seriously a I do because of an even earlier 
past made by Alex Schroeder[11], concerning the Secure Scuttlebut 
protocol.  Alex says "I don’t like systems where I cannot delete 
things. I don’t need non-repudiation since I’m talking to people, 
not signing contracts. Basically a “unforgeable append-only” 
system is similar to a legal set of contracts and not at all like 
conversations in real life".  He's well aware that it's impossible to 
guarantee all participants in a distributed system will comply with a 
request to delete something, but still thinks it's better to build 
systems which at least *try* to let us undo.  I'm still not 100% sure 
I agree, but I totally understand and respect the perspective.  
Systems with a hard "no take backs" property shouldn't be designed or 
used lightly - especially not in light-hearted social contexts, where 
they seem an especially bad match.

Drew Devault wrote about a year ago[12] about better (open-source, 
non-commercial, pro-privacy) approaches to search engines, in which 
he floats the idea of search engines not crawling the entire web, but 
limiting themselves to a list of "tier 1" domains which are 
"authoritative or high-quality sources for their respective 
specializations", as well as pages which link to tier 1 domains.  My 
idea of doing just a little bit of exploratory spidering and indexing 
of the git repos which are advertised in those repos you subscribe 
to, ending up with the ability to search a small, socially-defined 
corner of "gitspace", was inspired by this.  Search is a useful 
feature even if you can't search *all* the things.

[1] https://www.datprotocol.com/
[2] https://ipfs.io/
[3] https://scuttlebot.io/more/protocols/secure-scuttlebutt.html	
[4] https://freenetproject.org/
[5] gopher://republic.circumlunar.space:70/1/~joneworlds/	
[6] gopher://republic.circumlunar.space:70/1/zine
[7] gemini://rawtext.club/~ploum/2021-10-10.gmi
[8] https://subconscious.substack.com/p/saving-copies-of-everything-is-like
[9] gemini://gemini.circumlunar.space/users/hundredrabbits/
[10] gopher://zaibatsu.circumlunar.space:70/0/~solderpunk/phlog/the-individual-archivist-and-ghosts-of-gophers-past.txt
[11] gopher://alexschroeder.ch:70/0page/2018-06-29_No_Take_Back
[12] gemini://drewdevault.com/2020/11/17/Better-than-DuckDuckGo.gmi