This document is an extract from:

Mark P. McCahill and Farhad X. Anklesaria, (1995),
"Evolution of Internet Gopher", Journal of Universal
Computer Science, volume 1, issue 4, pages 235-246.

You can find the full version here:
http://www.jucs.org/jucs_1_4/evolution_of_internet_gopher

___________________________________________________________

EVOLUTION OF INTERNET GOPHER

Mark P. McCahill
(University of Minnesota, USA
mpm@boombox.micro.umn.edu)

Farhad X. Anklesaria
(University of Minnesota, USA
fxa@boombox.micro.umn.edu)

ABSTRACT:
How the Internet Gopher system has evolved since its first
released in 1991 and how Internet Gopher relates to other
popular Internet information systems.  Current problems and
future directions for the Internet Gopher system.

KEYWORDS:
Distributed Information Systems, Internet Gopher, Gopher+

CATEGORY: H.5.1

1 INTRODUCTION

This paper considers how the Internet Gopher system has
developed since its initial release in the spring of 1991,
and some of the problems that are driving further evolution
of Gopher and other popular Internet information systems.
Although two of the most popular new Internet information
systems (Gopher [McCahill 1992] and World Wide Web
[Berners-Lee 1992]) have become quite widely used, they both
have some basic architectural deficiencies that have become
apparent as the systems were deployed on a large scale.
These deficiencies limit the long term stability of the
information stored in these systems. For instance, both
Gopher and World Wide Web (WWW) refer to information by
location, and so both systems are plagued by references to
information that either is stale, has moved, or no longer
exists. Beyond the architectural problems, the volume of
information being published using Gopher and WWW requires
continued evolution of user interfaces and
categorization/searching technologies. The imminent arrival
of more Internet-aware page description languages must also
be accommodated by these systems, and will have a
significant impact on how client applications are written.

2 THE ORIGINAL INTERNET GOPHER

Internet Gopher was originally designed to be a simple
campus wide information system [Lindner 1994] to address
local needs at one institution (the University of
Minnesota). The design philosophy of Gopher was to make it
possible for departments and other groups at the University
of Minnesota to publish information on their own desktop
systems and to hide the distributed nature of the servers
from the user using the system [Alberti 1992].

2.1 A USER'S VIEW OF GOPHER

From the user's perspective, Gopher combines browsing a
hierarchy of menus, viewing documents selected from menus,
and submitting queries to full-text search engines (which
return menus of matches for the queries). The structure of
the menu hierarchy that the user sees depends on which
Gopher server the user first contacts, so there is no notion
of central control of the Gopher menu hierarchy. Gopher
server administrators run their systems autonomously and are
free to organize both the information on their server and
references to information on other servers to meet their
needs. This architecture makes it possible for the Gopher
system to scale up well since there is no central (or top
level) server to get overloaded. On the other hand, because
servers are autonomous, it is the responsibility of each
server administrator to make sure that references to items
on either server are not pointing to stale or nonexistent
information. Unfortunately, not all server administrators
are vigilant in maintaining the quality of their links to
other servers. This is analogous to the problem in WWW of
documents containing dangling and stale URLs.

Because of Gopher's distributed architecture, there are many
different organizations of the information in the Gopher
information space (Gopherspace). It is expected that users
will naturally prefer to start their Gopher clients by
pointing them to their favorite or home Gopher servers. This
distributed architecture encourages the formation of
communities of interest which grow up around well-run Gopher
servers. If there is no central server and users are
expected to naturally gravitate to favorite home servers how
does the client software know where to start?

Each user's Gopher client software can be configured with
the address and port number of the first gopher server to
contact when the client software is launched. When the
client software contacts this server, the server returns a
list of items (a menu) to the client. Items described in the
menu either reside on the server or are references to items
residing on other servers. These references to items on
other Gopher servers tell the Gopher client the domain name
of the server, the port number of the Gopher process, the
type of the item (document, directory, search engine, etc.),
and the Gopher selector string to be used to retrieve the
item. Given the minimal information required to describe the
location of an item on a Gopher server, it is easy for
server administrators to add references to other servers,
and many servers main value is as subject-matter specific
collections of references. Gopher client software also
generally provides a facility for saving references to items
selected by the user.

While the references to items on other servers would seem to
be Gopher-specific, this is not strictly true. It is
possible to map enough information into a Gopher descriptor
to make it possible to describe how to access most popular
Internet services, and Gopher descriptors are used to
reference items on HTTP servers, finger servers, telnet
sessions, CSO/Ph phone books, and via gateways X.500, WAIS,
Archie, and anonymous ftp servers.

2.2 GROWTH OF THE GOPHER SYSTEM

Although originally designed as a campus wide information
system, Gopher was rapidly adopted by a variety of
institutions and as of November 1994 it is estimated that
there are over 8000 servers on the Internet. Part of the
reason for Gophers widespread adoption as a platform for
publishing on the Internet is the ease of setting up and
running a server. Server administrators typically run a
gopher process and publish a part of their local file system
as a gopher hierarchy. By default, the names of the items in
the Gopher hierarchy are the filenames of the documents and
directories published by the server. Because Gopher
accommodates a variety of page description languages there
is no need to reformat or markup documents to be published.

Another reason for Gophers popularity is that server
administrators can add links to other Gopher servers and
give their users access other collections of information.
Since users are not necessarily aware where the information
they are accessing resides, server administrators can easily
take advantage of each others efforts.

2.3 LOCATING INFORMATION WITH GOPHER

Part of Gopher's popularity is due to the explicit
organization of information in the Gopher system which lends
itself to quickly finding information. While hypertext
documents are certainly useful as one type of information
content, it is not clear that a system consisting of nothing
but hypertext documents solves all the problems in an
information system. In fact, it is difficult to quickly make
sense of a complex, visually rich document. If one goal is
to allow users to quickly navigate, it is clear why browsers
for file systems always have provisions for a menu-like
listing of items: a predictable, consistent user interface
(such as a menu) is easy to make sense of quickly, and this
is necessary for rapidly traversing an information space.
Menu-based system also have advantages in that they can be
mapped into a variety of user interface metaphors and
browsers and can be compactly represented. Since one of the
design points for the Gopher system is to run quickly even
over low bandwidth links, a compact representation of a
collection (menu) allows users with slow machines to quickly
traverse Gopherspace to locate documents of interest.

A good metaphor for current information systems on the
Internet to compare them to a book. Books generally are
structured into three functional sections: a table of
contents, an index, and the pages in the middle of the book
(the content). Using this metaphor, Gopher acts something
like a book's table of contents since Gopher presents a
structured overview of the information and makes it possible
to jump directly to the section for a specific topic. Search
engines (such as WAIS, Archie, and VERONICA) are similar to
the index in the back of the book because they allow users
to jump into the content based on one or more words of
interest. To complete the metaphor, the pages in the middle
of the book are represented in a variety of formats on the
Internet; content may be in the form of text, graphics, or
page description and markup languages such as postscript,
PDF, and HTML. The Gopher protocol can of course provide
access to search engines and to document content in
different formats.

When an explicit hierarchical organization of information is
combined with a searchable index of titles of objects in
Gopher (such as the VERONICA index of all Gopher items or a
Jughead searchable index of items on a single server) users
have the choice of either browsing menus or submitting
queries to locate items. The scope of the queries can be
either global (VERONICA) or local to a specific server
(Jughead), and users can select the scope of their search by
traversing the hierarchy to locate an appropriate search
engine. Gopher has always been designed to be a hierarchical
framework to organize collections of documents and search
engines rather than a monomorphic Internet-aware page
description language.

2.4 ARCHITECTURAL LIMITATIONS

The ease of Gopher server setup goes hand-in-hand with an
architectural limitation of Gopher. To run a Gopher server
does not require either setting up any sort of replication
system or formal agreement with other sites to replicate or
mirror information. The lack of a formal system for
propagating redundant copies of information stored on Gopher
servers means that users may be going to the other side of
the world (or across a congested network) to fetch a copy of
a document that is also stored locally. Moreover, popular
servers are heavily loaded and there is no systematic way of
spreading the load to other sites (although ad hoc schemes
such as informal mirroring agreements are used to partially
address this problem). Note that this architectural
limitation is also inherent in WWW. Referring to items on
other server by location rather than by name makes it easy
to add new servers since there is no registration process
required as items are added to (or removed from) servers.

Referring to items by location rather than by name also
means that there is no name to location mapping service to
be maintained, and clients do not have to go through a name
to location resolution before attempting to fetch an item.
Unless the name to location mapping service is extremely
fast, it has the potential for being a significant
performance bottleneck. Given the scale of the Internet, it
is clear that any name resolution service will have to be
replicated and distributed to scale up properly, but this
means that the client will have to first locate the
appropriate name mapping server (potentially a slow process)
before it can proceed with a name to location lookup.

Clearly, there are significant engineering tradeoffs between
a system based on name-to-location resolution and a system
that only refers to information by absolute location. The
direction the Gopher system is taking to address these
architectural problems is to accommodate both reference by
location and reference by name. The expectation is that
clients will first attempt to resolve the reference by
location and if that fails (or the client finds that the
server responds slowly), the client will attempt a name to
location lookup to find other locations that hold the item
of interest. To accommodate multiple references to items by
both Uniform Resource Locators (URLs) and Uniform Resource
Names (URNs) requires a place to store the references as
some sort of meta-information. The original Gopher protocol
had no provisions for meta-information, but as we will see
later in this paper, Gopher+ solves this problem.

2.5 GOPHER GATEWAYS TO OTHER INFORMATION SYSTEMS

Soon after the initial release of Gopher, a concerted effort
was made to develop software gateways to give Gopher clients
access to information on servers such as anonymous ftp,
X.500, NNTP, and WAIS. These software gateways made
information on other systems visible to Gopher users and
handling the translation between Gopher requests and the
protocol on the target system (for instance: ftp).

The decision to write software gateways was driven by the
desire to keep Gopher clients small and simple. Rather than
building support for several protocols into the Gopher
clients, clients that only understand Gopher protocol can be
small, simple, easily written, and run on personal computers
that are relatively slow and have limited memory. Gateways
for Gopher clients greatly expanded the information Gopher
users could access. Not surprisingly, the gateways became
quite popular. Unfortunately, excessively popular machines
on the Internet tend to be either slow or the machines must
throttle back the demand for their services by refusing
requests when they are busy.

Since the original Gopher protocol had no provision for
expressing meta-information about an item, there was no
place in the protocol to tell clever clients which might
speak several protocols that the an item could be fetched
directly (for instance via an ftp session), so even clients
that had the capability for go direct to a non-Gopher
service such as ftp had no choice but to go through a Gopher
software gateway. Clearly, references to information
available via a different protocol ought to enumerate the
possible paths to the information (both through the gateway
and directly via the other protocol). To take advantage of
the work being done to develop URL-aware clients, the
enumerated references should be made as URLs. Not
surprisingly, the architectural problems and the lack of
meta-information in the original Gopher protocol are
addressed by the Gopher+ protocol.

...

> More: http://www.jucs.org/jucs_1_4/evolution_of_internet_gopher