This document is an extract from: Mark P. McCahill and Farhad X. Anklesaria, (1995), "Evolution of Internet Gopher", Journal of Universal Computer Science, volume 1, issue 4, pages 235-246. You can find the full version here: http://www.jucs.org/jucs_1_4/evolution_of_internet_gopher ___________________________________________________________ EVOLUTION OF INTERNET GOPHER Mark P. McCahill (University of Minnesota, USA mpm@boombox.micro.umn.edu) Farhad X. Anklesaria (University of Minnesota, USA fxa@boombox.micro.umn.edu) ABSTRACT: How the Internet Gopher system has evolved since its first released in 1991 and how Internet Gopher relates to other popular Internet information systems. Current problems and future directions for the Internet Gopher system. KEYWORDS: Distributed Information Systems, Internet Gopher, Gopher+ CATEGORY: H.5.1 1 INTRODUCTION This paper considers how the Internet Gopher system has developed since its initial release in the spring of 1991, and some of the problems that are driving further evolution of Gopher and other popular Internet information systems. Although two of the most popular new Internet information systems (Gopher [McCahill 1992] and World Wide Web [Berners-Lee 1992]) have become quite widely used, they both have some basic architectural deficiencies that have become apparent as the systems were deployed on a large scale. These deficiencies limit the long term stability of the information stored in these systems. For instance, both Gopher and World Wide Web (WWW) refer to information by location, and so both systems are plagued by references to information that either is stale, has moved, or no longer exists. Beyond the architectural problems, the volume of information being published using Gopher and WWW requires continued evolution of user interfaces and categorization/searching technologies. The imminent arrival of more Internet-aware page description languages must also be accommodated by these systems, and will have a significant impact on how client applications are written. 2 THE ORIGINAL INTERNET GOPHER Internet Gopher was originally designed to be a simple campus wide information system [Lindner 1994] to address local needs at one institution (the University of Minnesota). The design philosophy of Gopher was to make it possible for departments and other groups at the University of Minnesota to publish information on their own desktop systems and to hide the distributed nature of the servers from the user using the system [Alberti 1992]. 2.1 A USER'S VIEW OF GOPHER From the user's perspective, Gopher combines browsing a hierarchy of menus, viewing documents selected from menus, and submitting queries to full-text search engines (which return menus of matches for the queries). The structure of the menu hierarchy that the user sees depends on which Gopher server the user first contacts, so there is no notion of central control of the Gopher menu hierarchy. Gopher server administrators run their systems autonomously and are free to organize both the information on their server and references to information on other servers to meet their needs. This architecture makes it possible for the Gopher system to scale up well since there is no central (or top level) server to get overloaded. On the other hand, because servers are autonomous, it is the responsibility of each server administrator to make sure that references to items on either server are not pointing to stale or nonexistent information. Unfortunately, not all server administrators are vigilant in maintaining the quality of their links to other servers. This is analogous to the problem in WWW of documents containing dangling and stale URLs. Because of Gopher's distributed architecture, there are many different organizations of the information in the Gopher information space (Gopherspace). It is expected that users will naturally prefer to start their Gopher clients by pointing them to their favorite or home Gopher servers. This distributed architecture encourages the formation of communities of interest which grow up around well-run Gopher servers. If there is no central server and users are expected to naturally gravitate to favorite home servers how does the client software know where to start? Each user's Gopher client software can be configured with the address and port number of the first gopher server to contact when the client software is launched. When the client software contacts this server, the server returns a list of items (a menu) to the client. Items described in the menu either reside on the server or are references to items residing on other servers. These references to items on other Gopher servers tell the Gopher client the domain name of the server, the port number of the Gopher process, the type of the item (document, directory, search engine, etc.), and the Gopher selector string to be used to retrieve the item. Given the minimal information required to describe the location of an item on a Gopher server, it is easy for server administrators to add references to other servers, and many servers main value is as subject-matter specific collections of references. Gopher client software also generally provides a facility for saving references to items selected by the user. While the references to items on other servers would seem to be Gopher-specific, this is not strictly true. It is possible to map enough information into a Gopher descriptor to make it possible to describe how to access most popular Internet services, and Gopher descriptors are used to reference items on HTTP servers, finger servers, telnet sessions, CSO/Ph phone books, and via gateways X.500, WAIS, Archie, and anonymous ftp servers. 2.2 GROWTH OF THE GOPHER SYSTEM Although originally designed as a campus wide information system, Gopher was rapidly adopted by a variety of institutions and as of November 1994 it is estimated that there are over 8000 servers on the Internet. Part of the reason for Gophers widespread adoption as a platform for publishing on the Internet is the ease of setting up and running a server. Server administrators typically run a gopher process and publish a part of their local file system as a gopher hierarchy. By default, the names of the items in the Gopher hierarchy are the filenames of the documents and directories published by the server. Because Gopher accommodates a variety of page description languages there is no need to reformat or markup documents to be published. Another reason for Gophers popularity is that server administrators can add links to other Gopher servers and give their users access other collections of information. Since users are not necessarily aware where the information they are accessing resides, server administrators can easily take advantage of each others efforts. 2.3 LOCATING INFORMATION WITH GOPHER Part of Gopher's popularity is due to the explicit organization of information in the Gopher system which lends itself to quickly finding information. While hypertext documents are certainly useful as one type of information content, it is not clear that a system consisting of nothing but hypertext documents solves all the problems in an information system. In fact, it is difficult to quickly make sense of a complex, visually rich document. If one goal is to allow users to quickly navigate, it is clear why browsers for file systems always have provisions for a menu-like listing of items: a predictable, consistent user interface (such as a menu) is easy to make sense of quickly, and this is necessary for rapidly traversing an information space. Menu-based system also have advantages in that they can be mapped into a variety of user interface metaphors and browsers and can be compactly represented. Since one of the design points for the Gopher system is to run quickly even over low bandwidth links, a compact representation of a collection (menu) allows users with slow machines to quickly traverse Gopherspace to locate documents of interest. A good metaphor for current information systems on the Internet to compare them to a book. Books generally are structured into three functional sections: a table of contents, an index, and the pages in the middle of the book (the content). Using this metaphor, Gopher acts something like a book's table of contents since Gopher presents a structured overview of the information and makes it possible to jump directly to the section for a specific topic. Search engines (such as WAIS, Archie, and VERONICA) are similar to the index in the back of the book because they allow users to jump into the content based on one or more words of interest. To complete the metaphor, the pages in the middle of the book are represented in a variety of formats on the Internet; content may be in the form of text, graphics, or page description and markup languages such as postscript, PDF, and HTML. The Gopher protocol can of course provide access to search engines and to document content in different formats. When an explicit hierarchical organization of information is combined with a searchable index of titles of objects in Gopher (such as the VERONICA index of all Gopher items or a Jughead searchable index of items on a single server) users have the choice of either browsing menus or submitting queries to locate items. The scope of the queries can be either global (VERONICA) or local to a specific server (Jughead), and users can select the scope of their search by traversing the hierarchy to locate an appropriate search engine. Gopher has always been designed to be a hierarchical framework to organize collections of documents and search engines rather than a monomorphic Internet-aware page description language. 2.4 ARCHITECTURAL LIMITATIONS The ease of Gopher server setup goes hand-in-hand with an architectural limitation of Gopher. To run a Gopher server does not require either setting up any sort of replication system or formal agreement with other sites to replicate or mirror information. The lack of a formal system for propagating redundant copies of information stored on Gopher servers means that users may be going to the other side of the world (or across a congested network) to fetch a copy of a document that is also stored locally. Moreover, popular servers are heavily loaded and there is no systematic way of spreading the load to other sites (although ad hoc schemes such as informal mirroring agreements are used to partially address this problem). Note that this architectural limitation is also inherent in WWW. Referring to items on other server by location rather than by name makes it easy to add new servers since there is no registration process required as items are added to (or removed from) servers. Referring to items by location rather than by name also means that there is no name to location mapping service to be maintained, and clients do not have to go through a name to location resolution before attempting to fetch an item. Unless the name to location mapping service is extremely fast, it has the potential for being a significant performance bottleneck. Given the scale of the Internet, it is clear that any name resolution service will have to be replicated and distributed to scale up properly, but this means that the client will have to first locate the appropriate name mapping server (potentially a slow process) before it can proceed with a name to location lookup. Clearly, there are significant engineering tradeoffs between a system based on name-to-location resolution and a system that only refers to information by absolute location. The direction the Gopher system is taking to address these architectural problems is to accommodate both reference by location and reference by name. The expectation is that clients will first attempt to resolve the reference by location and if that fails (or the client finds that the server responds slowly), the client will attempt a name to location lookup to find other locations that hold the item of interest. To accommodate multiple references to items by both Uniform Resource Locators (URLs) and Uniform Resource Names (URNs) requires a place to store the references as some sort of meta-information. The original Gopher protocol had no provisions for meta-information, but as we will see later in this paper, Gopher+ solves this problem. 2.5 GOPHER GATEWAYS TO OTHER INFORMATION SYSTEMS Soon after the initial release of Gopher, a concerted effort was made to develop software gateways to give Gopher clients access to information on servers such as anonymous ftp, X.500, NNTP, and WAIS. These software gateways made information on other systems visible to Gopher users and handling the translation between Gopher requests and the protocol on the target system (for instance: ftp). The decision to write software gateways was driven by the desire to keep Gopher clients small and simple. Rather than building support for several protocols into the Gopher clients, clients that only understand Gopher protocol can be small, simple, easily written, and run on personal computers that are relatively slow and have limited memory. Gateways for Gopher clients greatly expanded the information Gopher users could access. Not surprisingly, the gateways became quite popular. Unfortunately, excessively popular machines on the Internet tend to be either slow or the machines must throttle back the demand for their services by refusing requests when they are busy. Since the original Gopher protocol had no provision for expressing meta-information about an item, there was no place in the protocol to tell clever clients which might speak several protocols that the an item could be fetched directly (for instance via an ftp session), so even clients that had the capability for go direct to a non-Gopher service such as ftp had no choice but to go through a Gopher software gateway. Clearly, references to information available via a different protocol ought to enumerate the possible paths to the information (both through the gateway and directly via the other protocol). To take advantage of the work being done to develop URL-aware clients, the enumerated references should be made as URLs. Not surprisingly, the architectural problems and the lack of meta-information in the original Gopher protocol are addressed by the Gopher+ protocol. ... > More: http://www.jucs.org/jucs_1_4/evolution_of_internet_gopher