Network Working Group Murali R. Krishnan INTERNET-DRAFT Microsoft Corp. James Casey Harlequin Inc Expires six months from Dec 4, 1996 A Gopher URL Format Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." To learn the current status of any Internet-Draft, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), ds.internic.net (US East Coast), nic.nordu.net (Europe), ftp.isi.edu (US West Coast), or munnari.oz.au (Pacific Rim). Distribution of this document is unlimited. Please send comments to the proposed URI working group at uri@bunyip.com. General discussions about URL and the applications which use URLs should take place on the uri@bunyip.com. Abstract This document defines the format of Uniform Resource Locators (URL) for the Gopher and Gopher+ protocols using the general URL syntax defined in RFC xxxx, "Uniform Resource Locators (URL)". It is a one of a suite of documents which replace RFC 1738, "Uniform Resource Locators", and RFC 1808, "Relative Uniform Resource Locators". Table of Contents 1. Introduction 2. Gopher URL syntax 2.1 Specification 2.2 Basic Retrieval 2.3 Gopher Search Retrieval 2.4 Gopher+ Items 2.5 Gopher+ Data representation 2.6 Gopher+ Item attribute collections 2.7 References to Gopher+ attributes 2.8 Gopher+ alternate views 2.9 Relative Gopher URLs 3. Issues 3.1 Gopher+ electronic forms 3.2 Gopher+ items with electronic forms 4. Security Considerations 5. Acknowledgements 6. References 7. Authors Addresses A. Changes from RFC 1738 and RFC 1808 1. Introduction The Gopher URL scheme specifies the format of URL used for distributed document search and retrieval using the Internet Gopher Protocol. The base Gopher protocol is described in RFC 1436 [1] and supports items and collection of items (directories). Each item is identified with a Gopher type and user-visible name. The Gopher+ protocol [2] is a set of upward compatible extensions to the base protocol. It supports associating arbitrary number of attributes and alternate data representations with a Gopher item. Gopher+ also supports querying a subset of item attributes and selective retrieval. The Gopher URLs as proposed in RFC 1738 [3] support both Gopher and Gopher+ items and attributes. This document updates that specification. It uses the BNF and basic rules as laid out in section 4.3 of [RFC-URL-SYNTAX]. 2. Gopher URL syntax 2.1 Specification A Gopher URL follows the common internet scheme syntax as defined in section 4.3 of [RFC-URL-SYNTAX]: gopher://[:]/ where := | %09 | %09%09 := '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' '8' | '9' | '+' | 'I' | 'g' | 'T' := *pchar Refer to RFC 1808 [4] := *pchar := *uchar Refer to RFC 1738 [3] If the optional port is omitted, the port defaults to 70. is a single character field to denote the Gopher type of the resource to which the URL refers to. is an opaque string supplied by the Gopher server for retrieving an item. The selector strings are a sequence of octets which may contain any octets other than hexadecimal 09 (US-ASCII HT or tab), hexadecimal 0A (US-ASCII LF or line-feed), and 0D (US-ASCII CR or carriage-return). Gopher uses the combination to terminate a line in the request. Gopher clients send the selector string to the Gopher server to retrieve an item. The gopher-type field is used by clients to interpret the response type expected. Note that some strings begin with a copy of the character, in which case the character will occur twice in the URL consecutively. Within the , no characters are reserved. The clients are not permitted to make any interpretation of the selector string. The entire may be empty, in which case the delimiting "/" is also optional and the defaults to "1". The selector string may be an empty string; this is how the Gopher clients refer to the top-level directory on the Gopher server. 2.2 Basic Retrieval The response may be interpreted by a client or a proxy, and URLs will be constructed for any embedded gopher items in the response. The specification of Gopher URL construction from a response is dependent on extraction of the selector, search and Gopher+ portions from it and is beyond the scope of this document. 2.3 Gopher Search Retrieval If the URL refers to a search to be submitted (i.e., Gopher type is '7'), to a Gopher search engine, then the selector is followed by an encoded tab (%09) and the search string. To submit the search to a Gopher search engine, the client should send the decoded string, a tab (unencoded), the search string and terminating to the Gopher server. The search string can contain any character except the , or characters. 2.4 Gopher+ Items The URLs for Gopher+ items have a second encoded tab (%09) and a Gopher+ string. The represents information required for retrieval of Gopher+ items. Gopher+ supports items with arbitrary sets of attributes (including +ABSTRACT, +VIEWS, etc), alternate views, and may have electronic forms. To retrieve the data associated with a Gopher+ URL, a client will connect to the Gopher server, and send the Gopher selector, a tab (unencoded), string, another tab (unencoded), and the . The client also should be prepared to receive a Gopher+ response back from the server. Even in the cases where string is empty, the URL should include two tabs %09 between the and the , with the tabs occurring consecutively. 2.5 Gopher+ data representation The Gopher+ server tags the responses of directory listing with an appropriate Gopher+ string to indicate the availability of Gopher+ items. An item is tagged with a "+" to denote that this is a Gopher+ item or with a "?" to indicate that the item has a +ASK form associated with it. A Gopher URL with a consisting of only a "+" refers to the default view (data representation) of the item. An item with additional attributes in the can be used for requesting alternate views of the item. A containing only a "?" refers to an item with a Gopher electronic form associated with it. 2.6 Gopher+ item attribute collections The Gopher URL's consists of "!" or "$" to refer to the attributes of Gopher+ items. A "!" refers to all the attributes for the particular Gopher+ item (think of it as "i" inverted, where "i" means information). A "$" refers to all the attributes of all items in a Gopher directory. If "$" is used with a non-directory item, it is equivalent to a "!". 2.7 References to specific Gopher+ attributes := ! | $ | ! | $ := *[ "%20" ] A Gopher+ item may have multiple attributes. The URL's can contain "!" to refer to specific attributes. For example, to refer to the attribute containing the abstract of an item, the would be "!+ABSTRACT". To refer to the attribute containing the views of all items in a directory, the Gopher+ string will be "$+VIEWS". A subset of attributes may be specified using a sequence of attribute names separated by hex-coded spaces. For example, "!+ABSTRACT%20+ADMIN" refers to the +ABSTRACT and +ADMIN attributes of an item. A client will include the appropriate in the form as specified in section 2.4 above, to retrieve the particular set of attributes. 2.8 Gopher+ alternate views Gopher+ supports query and retrieval of alternate data representations (alternate views) of items. Alternate views are referred to using the following : := +%20 where is a MIME content type consists of the ISO-639 language code and ISO-3166 country code joined together with a "_". For example, a of "+text/html%20Es_ES" refers to the Spanish language html alternate view of the Gopher+ item. To retrieve a Gopher+ alternate view, a Gopher+ client sends the appropriate view and language name as the Gopher+ string. All alternate views available can be queried using +VIEWS attribute for the item. 2.9 Relative Gopher URLs Since the Gopher URL syntax matches the generic-URL syntax, it supports all forms of relative URLs defined in [RFC-URL-SYNTAX]. 3. Issues 3.1 Gopher+ electronic forms Gopher+ electronic forms are cumbersome to use and are limited in scope. The number of Gopher servers themselves are very low around in the internet. Only a limited number of servers support Gopher+ protocol and among them only a minority has support for electronic forms. It is proposed that we remove the specification of ASK forms from this revision of Gopher URLs. If it is required (for compatibility or some such reasons), section 3.2 outlines the encoding of electronic forms in Gopher URLs. 3.2 Gopher+ items with electronic forms The Gopher+ items tagged with a "?" have an electronic form associated with them. Such items require client to fetch the item's +ASK attribute to get the form definition, and then ask the user to fill out the form and return the user's responses along with the selector string to retrieve the item. Gopher+ clients know how to do this but are activated only when the tag "?" in the Gopher+ item. The for an URL that refers to the response generated for such an electronic form (using +ASK) is of the form: := +%09. where := '1' := "%0D" := "%0A" := "+-1" *[] For example, a form with two values "New York" and "USA" will appear as +%091%0D%0A+-1%0D%0ANew%20York%0D%0AUSA%0D%0A.%0D%0A Note that the space in "New York" is encoded when used inside the URL's To retrieve such an item, the gopher+ client sends: +1 +-1 New York USA . 4. Security Considerations The same security considerations as specified in [RFC-URL-SYNTAX] applies here as well. Gopher does not support users to logon. Hence there is no need and way to specify username or password. This reduces the security implications from wrong usage. 5. Acknowledgements This document is derived from RFC 1738 and RFC 1808. The acknowledgements from the specifications still applies. 6. References [1] Anklesaria, F., McCahill, M., Lindner, P., Johnson, D., Torrey, D., and B. Alberti, "The Internet Gopher Protocol (a distributed document search and retrieval protocol)", RFC 1436, University of Minnesota, March 1993. [2] Anklesaria, F., Lindner, P., McCahill, M., Torrey, D., Johnson, D., and B. Alberti, "Gopher+: Upward compatible enhancements to the Internet Gopher Protocol", University of Minnesota, July 1993. March 1993. [3] Berners-Lee, T., Masinter, L., and M. McCahill, Editors, "Uniform Resource Locators (URL)", RFC 1738, CERN, Xerox Corporation, University of Minnesota, December 1994. [4] Fielding, R., "Relative Uniform Resource Locators", RFC 1808, University of California, Irvine, June 1995. [RFC-URL-SYNTAX] Berners-Lee, T., Fielding, R., Masinter L., "Uniform Resource Locators (URL)", , MIT/LCS, U.C. Irvine, Xerox Corporation, October 1996. 7. Authors Addresses Murali R. Krishnan Microsoft Corporation One Microsoft Way Redmond, WA 98052, USA Phone: (206)-703-0229 Fax: (206)-936-7329 Email: muralik@microsoft.com James Casey Harlequin Inc 1010 El Camino Real Suite 310 Menlo Park, CA 94025 Phone: (415)-833-4023 Email: jamesc@harlequin.com Note to RFC Editor This reference should change when status of generic syntax draft changes. Appendix A. Changes from RFC 1738 and RFC 1808 Section 3.4 of RFC 1738, and parts of section 2.2, 2.3 of RFC 1808, were used as the basis for this draft. Section 2.10 has been created to address the use of relative URLs in the Gopher scheme based on RFC 1808. Section 2.1 specifies the lists of all valid Gopher types and overrides the general specification found in section 3.4.1 of RFC 1738. Section 2.8 specifies the use of +VIEWS attribute to query Gopher+ alternate views. In RFC 1738 it was mentioned as +VIEW. This draft overrides RFC 1738. Section 3.1 proposes dropping support for +ASK electronic forms from the Gopher URLs. +ASK forms are only way to have interactive forms in Gopher. Given it is used in very limited places, it is less important. [RFC-URL-SYNTAX] rationalizes the two BNFs provided in the RFC 1738 and RFC 1808, which means the set of allowable characters in any selector or search parameters of the Gopher URL is slightly different from that allowed by RFC 1738. This is documented fully in appendix E 'Summary of non-editorial changes' of [RFC-URL-SYNTAX].