Using Gopher with the World-Wide-Web Paul Lindner lindner@boombox.micro.umn.edu GopherCon 95 Hypertext Markup Language (HTML) is now a common document format viewable with a World Wide Web (WWW) Browser. Gopher hierarchies are fast to navigate and let you associate abstracts, alternate views, and other meta information with any type of document. This paper looks at combining the strengths of Gopher and WWW to create a better, more integrated information system. We'll look at how to publish HTML documents from Gopher servers, point Gopher links at documents on WWW servers, and customize the appearance of your Gopher server using HTML. INTRODUCTION In the past year we've seen an explosive growth in the number of World Wide Web (WWW) software clients that use the HyperText Markup Language (HTML). Most of these WWW client programs have rudimentary Gopher support. However, most of these clients do not support the Gopher+ protocol. To remedy this the University of Minnesota Gopher Team made many modifications to the Unix Gopher server to support direct HTML and HyperText Transport Protocol (HTTP) access. Many of these modifications were refined over time with the help of the Internet community. The modifications fall into three broad areas: · Storage of HTML Documents. · Internal Changes to support Uniform Resource Locators. · Automatic HTML generation for Gopher Directories THE WORLD WIDE WEB A review of the basic syntax and terms used in the World Wide Web (WWW) will be helpful in understanding the changes to the Gopher software. The three definitions that you must know about are URLs, HTML, and HTTP. 2.1 Uniform Resource Locators The "Uniform Resource Locator" (URL) was first used by WWW software to describe the servers it could connect to in a uniform way. It has since become popular outside WWW software as a simple, standard way to locate resources. URLs are now internet standards. The Internet Engineering Task Force (IETF) has codified the URL standard in RFC 1630. A Gopher item can be represented in URL form as gopher://:/ If the port is the gopher default of 70, the URL can optionally be shortened to: gopher:/// The "root level" of a gopher server has always been defined as a menu of type 1 with no path required, and it can be represented as: gopher://:/ (or without the : if the port is 70) The popular Unix gopher server software from the University of Minnesota uses path strings with the type as the first character, which results in a URL that looks like the type character has been doubled. In order to make URL's more transportable, spaces and other special characters in paths are "escaped" into the form "%XX", where XX is the hexadecimal ASCII/ISO-8859-1 character code. For more details, see the URL definitions on the Web at http://www.w3.org/hypertext/WWW/Addressing/Addressing.html or in Internet RFC 1630. HYPERTEXT MARKUP LANGUAGE HyperText Markup Language (HTML) is a data format used to create hypertext documents that are portable from one platform to another. HTML documents are SGML documents with general semantics that are appropriate for representing information from a wide variety of domains. Here's an example HTML file: FIGURE 1.\x13Sample HTML File

My Home Page

Click Here for more information. HTML document can be written by hand, or generated automatically using a variety of utility programs. HYPERTEXT TRANSFER PROTOCOL The HyperText Transfer Protocol (HTTP) is the method whereby most WWW clients move information from servers. HTTP uses many of the same concepts as the Gopher Protocol. Both of them use TCP-IP streams to transmit data, both have a command-response method of operation. There are currently two flavors of HTTP being used actively, HTTP/0.9 and HTTP/1.0. HTTP/0.9 is slowly being phased out. They are mostly upward compatible however. ADDING HTML DOCUMENTS TO YOUR SERVER. SIMPLE HTML FILES Adding HTML documents is as simple as copying them to your server's data directory. You just need to make sure that the document ends with the file extension .html. For example, if you had an HTML document in the file moo.html you could give access to this file by copying it to the Gopher Server's data directory. Gopher clients will see this as a HTML document. Gopher clients cannot view these files directly, they must spawn a separate viewer. However, World Wide Web clients can directly view these files. USING MULTIPLE VIEWS The Unix Gopher+ server supports the concept of multiple representations of a document. We call this concept `multiple views' of the same document. Thus we could have a picture available in a number of different graphics formats. We can use this feature for HTML documents as well. Since most Gopher clients cannot interpret HTML we can make a text version of our HTML documents available by adding another file to our server. To make this text version available we create a file without the HTML extension. In this file we put the simple text version. For instance, the HTML in figure 1 above could be represented as: MY HOME PAGE Click Here for more information. Comments? Send Email to gopher@boombox.micro.umn.edu Note that the text version will lose the ability to connect to the hyperlinks in your HTML document. We did this conversion by hand. What if we had a lot of large documents that we needed to convert? We don't want to do this by hand, that's much too tedious. Instead we can use the lynx WWW browser software to automatically convert our HTML files to text. You can get lynx by accessing the following URL: Here's an example of using lynx to convert an HTML file to text: lynx -dump moo.html >moo If you expect your HTML file to change often you can make the `moo' file a shell script. The following figure provides an example of using a shell script to