.\" Process with "pic description.me | troff -me | whatever" or more simply .\" "groff -p -me description.me | lpr -Pps" .\" @(#)$Id: description.me,v 1.2 1992/08/17 13:34:11 paul Exp $ .if n .na .if n .ll 6.5i .(l C .sz 18 .ft B The CCSO Nameserver \- A Description .sp .sz .ft by Steven Dorner\ \ \ s\-dorner@uiuc.edu Computer and Communications Services Office University of Illinois at Urbana .sp July 26, 1989 .sp 2 updated by Paul Pomes\ \ \ paul\-pomes@uiuc.edu Computer and Communications Services Office University of Illinois at Urbana .sp August 2, 1992 .(f Converted to portable n/troff format using the -me macros from funky Next WriteNow format (icch). .)f .)l .eh '%''The CCSO Nameserver \- A Description' .oh 'The CCSO Nameserver \- A Description''%' .sp 3 .uh Introduction .lp This document provides an overview of the CCSO Nameserver. It should give the reader a good idea of the capabilities, implementation and performance of the Nameserver. .uh Overview .lp The CCSO Nameserver is a computer resident .q "phone book" . It can keep a relatively small amount of information about a relatively large number of people or things, and provide fast access to that information over the Internet.\** .(f \** The collection of local, regional, and national networks using the TCP/IP protocols. .)f Here at the University of Illinois, we keep the contents of the .q "white pages" of our .i "Student/Staff Directory" as well as other selected information, in the Nameserver. .lp Unlike a printed directory, the information in the CCSO Nameserver is dynamic. It can be updated at any time, from any computer on the Internet capable of running the .q client program, .i ph .\** .(f \** At present this means 4.[23] BSD UNIX, VMS, VM/CMS, DOS, or Macintosh. .)f The Nameserver can also be taught to keep new .b types of information, such as electronic mail addresses or office hours, without recompilation or change to the existing database. .lp The remainder of this document will examine in somewhat further depth three aspects of the Nameserver; what it does (\fBCapabilities\fP), how it does them (\fBImplementation\fP), and how well it does them (\fBPerformance\fP). There are in\-depth documents describing some of these aspects of the Nameserver; the interested reader may refer to the .b References section for the titles of these other documents. .uh "Capabilities \(em The Database" .lp The CCSO Nameserver manages a database that consists of many individual .i entries . Each entry contains one or more .i fields , each field consisting of a one or more printable .sm ASCII characters (including tab and newline). Each field is associated with a particular .i "field description" that is used to specify the behavior of the field. A field description includes a name, a maximum length for the fields it describes, and certain .i properties that determine how the field is used. .lp There are essentially no intrinsic limits on the size of the database, in number of entries, numbers of field descriptions, numbers of fields per entry, or sizes of fields.\** .(f \** Actually there are limits imposed by the 32-bit pointers used throughout the system. Before those limits could be reached, however, the database would likely be too large to manage. .)f .lp Certain fields\** .(f \** Those whose field descriptions in the .i .cnf file contain the property .q Indexed . .)f in the database are indexed. Words from these fields can be used as keys to select entries in the database. Words from any field may be used to refine the selection made by the key fields. The indexing scheme used is .q double\-hashing , and results in very fast lookups for key fields. The hash table is also indexed to facilitate pattern matching on the hash table (and hence the database). .uh "Capabilities \(em The Server" .lp The database resides entirely on one computer and is managed by a server program, .i qi (query interpreter). Multiple instances of .i qi may be executing at any one time; access to the database is controlled by advisory locks. Any number of processes may read the database, unless a process is writing the database, in which case all processes must wait for that process to complete its work before beginning their own. .lp .i Qi uses a command\-reply scheme like that used by FTP.\** .(f \** See RFC-959, .i "File Transfer Protocol (FTP)" , J. Postel and J. Reynolds. .)f It accepts commands from its standard input, and writes replies on its standard output. Both commands and replies are couched in .q netascii ; lines consisting of printable .sm ASCII characters terminated with a newline (\c .sm ASCII 10) or carriage\-return newline (\c .sm ASCII 13 .sm ASCII 10) pair. Additionally, the backslash .q \e is used to .q escape certain characters, as in the C programming language.\** .(f \** The set of such escapes is much more limited than in C; only .q \en for newline, .q \et for tab, .q \e" for double-quote, and .q \e\e for backslash are allowed. .)f .lp Commands consist of a keyword optionally followed by one or more arguments or keywords. Commands include: .b query for querying the database; .b change for changing fields in entries; .b add for adding new entries. Replies consist of a numerical code ranging from \-599 to 599, and additional text. The numerical codes may indicate an operation in progress (100\-199), success (200\-299), a request for further information (300\-399), temporary failure (400\-499), or permanent failure (500\-599). Replies in the range from \-599 to \-100 indicate that further replies are to be expected for the current command; they otherwise have the same meanings as their positive counterparts. .lp The behavior of .i qi may be modified by use of certain .i options , accessed by the .b set command. The number of available options is small; the most important options are .i echo , which causes .i qi to print commands on its output before executing them, and .i limit , which allows the user to specify a maximum number of entries to which a command may be applied. .lp .i Qi operates in three different modes; anonymous, login, and hero. Each mode is more liberal than the previous in the operations it allows, and consequently more difficult to access. Anonymous mode is used to make queries of public information\** .(f \** I.e., to view fields whose field description contain the property .q Public . .)f and for a few other innocuous purposes. In anonymous mode, there is a maximum number of entries that can be viewed with one command; the purpose of this limitation is to discourage the use of the Nameserver for the preparation of mailing lists. Anonymous mode is used for most queries of the Nameserver. .lp To enter login mode, a user must identify himself as the owner of a particular Nameserver entry by giving an .i alias (login name) and a password.\** .(f \** Actually the user is asked to encrypt a string using his password, and .i qi compares the result returned with the result it obtained by encrypting the same string with the user's stored password. This is to provide additional security when running .i qi over networks; the user's password is never sent .q "in the clear" over a potentially insecure network. .)f In addition to the capabilities of anonymous mode, login mode allows the logged in user to change fields from his or her own entry in the Nameserver.\** .(f \** Actually a user may change only those fields whose field description contain the property .q Change . .)f .lp Hero mode is entered either by entering login mode as a Nameserver .q hero (superuser) or by running .i qi directly from a terminal, rather than over a network. In this mode, all artificial limits are removed; the hero may change any field in any entry in the database, as well as view as many entries as he wishes. Hero mode is used mostly for administrative purposes. .uh "Capabilities \(em Queries" .lp Since most of what the Nameserver does is answer queries, it is fitting to describe queries more fully here. A nameserver query consists of five elements; the .q query keyword, values for one or more indexed fields, values for zero or more non\-indexed fields, optionally the .q return keyword, and optionally a list of fields to print from the selected entries. A couple of examples will clarify. First, a plain query; the arguments are interpreted as requests for words from the name or nickname fields, both of them indexed fields: .(b \f(CRqi>\fP \f(CBquery steven dorner\fP .ft CR \-200:1: alias: s\-dorner \-200:1: name: dorner steven c. \-200:1: email: dorner@garcon.cso.uiuc.edu \-200:1: phone: (w) 244\-1765 \-200:1: address: 181 DCL, MC 256 \-200:1: : 1201 W. Washington, C, 61821 \-200:1: title: res programmer \-200:1: nickname: Steve \-200:1: hours: 8\-4 weekdays 200:Ok. .ft .)b .lp Here is an example that uses all five elements. The .q department field is not indexed. .(b \f(CRqi>\fP \f(CBquery dorner department=computing return name email department\fP .ft CR \-200:1: name: dorner steven c. \-200:1: email: dorner@garcon.cso.uiuc.edu \-200:1: department: computing services office 200:Ok. .ft .)b .uh "Capabilities \(em The Client" .lp Usually, the Nameserver is accessed via the .q client program .i ph . This program makes a connection to a copy of .i qi on the machine that keeps the Nameserver database. It then provides assistance to the user of the Nameserver; it formulates queries, formats Nameserver responses, and provides other conveniences. .lp .i Ph operates in two modes; command-line and interactive. In command-line mode, .i ph forms a Nameserver query from the arguments given it, sends it to .i qi , prints the result, and exits. In interactive mode, .i ph reads commands from the user, relays them to .i qi , and prints .i qi 's responses. The responses are automatically sent through a paging program. Some commands given to ph are expanded into more than one qi command. For example, the .i ph .q edit command first asks .i qi for the value of the desired field, puts that value in an editor where the user edits it as s/he pleases, and then issues a .q change command to change the field to its desired new value. .uh "Implementation \(em The Source" .lp The Nameserver is written in C (a small parser is written in lex\**), .(f \** See .i "Lex\-A Lexical Analyzer Generator" , M.E. Lesk and E. Schmidt. .)f and runs on UNIX systems. The client, .i ph , may be run on 4.[23]BSD derived systems. A version of .i ph exists for VMS, DOS, Mac, and a limited version exists for VM/CMS systems. .lp There were at last count 320,000 bytes of C and lex source code; some 6,000 statements in 63 files. This source is divided into several distinct categories; .i qi (230,000 bytes, 28 files, 3500 statements), .i ph (46,000 bytes, 3 files, 700 statements), utilities (89,000 bytes, 21 files, 1700 statements), and libraries (19,500 bytes, 11 files, 300 statements). .lp The database and .i qi reside on a Digital Equipment Corporation VAXServer 3500 running Ultrix. .uh "Implementation \(em The Database" .lp The database is kept in six files with the extensions .i .dir , .i .dov , .i .idx , .i .iov , .i .seq , and .i .bdx . The .i .dir and .i .dov files contain the actual data. The .i .idx and .i .iov files contain the hash table, with pointers into the data files. The .i .seq file contains all the words from the hash table, sorted alphabetically, along with pointers into the hash table; it is used for pattern-matching on the hash table. The .i .bdx file contains a tree of four-letter nodes, each node pointing to where entries with those four letters begin in the .i .seq file; the .i .bdx file speeds search of the .i .seq file. .(z .ie n \{\ .(c Nroff version of Figure 1 not available. .)c .\} .el .ie !"\*(.T"" \ \{\ .PS boxht = .3; boxwid = .6 define blk { $1: [ down A: box $2 B: box C: box D: box E: box ht 2*boxht "." "." ] } right blk(Bdx, ".bdx") move blk(Seq, ".seq") move blk(Idx, ".idx") move blk(Iov, ".iov") move blk(Dir, ".dir") move blk(Dov, ".dov") arrow from Bdx.B to Seq.C.w arrow from Bdx.C to Seq.E.w arrow from Seq.C to Idx.B.w arrow from Seq.C to Idx.D.w arrow from Seq.D to Idx.E.w arrow from Idx.B to Iov.B.w arrow from Idx.D to Iov.C.w arrow from Iov.B to Dir.C.w arrow from Iov.B to Dir.D.w arrow from Iov.D to Dir.C.w arrow from Iov.E to Dir.E.w arrow from Dir.C to Dov.E.w arrow from Dir.E to Dov.B.w .PE .\} .el .sp 2i .ce .b "Figure 1. Database Organization" .)z .lp The .i .dir file consists of a header and one fixed-length record for each entry in the database. If there is too much data for one record, the remainder is placed in the .i .dov file. The .i .dov file also consists of fixed-length records, and if one is not enough, the remainder can be placed in more .i .dov records. Thus, an entry is really a linked list of fixed-length records, and is not limited in size. It is relatively easy to play with the sizes of the .i .dir and .i .dov records (before compilation and installation of the database) for optimum performance. We use a fairly small record size in the .i .dir file, to minimize space wastage,\** .(f \** Not entirely successfully \- see .b "Performance \(em Database Size" below. .)f and a fairly large record size in the .i .dov file, to minimize linking. Most entries are wholly contained in the .i .dir file; most of the rest require only one .i .dov record. .lp Each entry begins with some fixed-length information, followed by the fields that make up its data. Each field is a null-terminated .sm ASCII string. A field begins with an .sm ASCII string that is the id of the field description for that field, and a colon. The field's data follows, and then the null terminator (\c .sm ASCII 0). Tagging each field with its description number means that the database is not sensitive to the presence, absence or order of the fields. This in turn means that field descriptions can be added to the Nameserver at will, and the newly-defined fields used, without recompilation or rebuilding of the database (see .b "Implementation \(em Field Descriptions" below). .lp The .i .idx file is made up of a fixed number of fixed-length records. Each record that is in use contains a word from an indexed field, and a set of pointers to the .i .dir records that contain the word in an indexed field. Overflow in the .i .idx file is handled like overflow in the .i .dir file; the excess pointers are put in one or more fixed-length records in the .i .iov file. Words are indexed by computing a hash function. If the selected location is not empty but does not contain the desired word, the hash function is iterated, until a limit is reached (implying the failure of the index) or the word or an empty spot is found. If the spot is empty, the word and a pointer to the entry in which it occurs is placed in the record. If the spot is not empty, a pointer to the entry is appended to the list of pointers for that word. .lp The .i .seq file uses fixed-length records (called .i leaves ) to keep a sorted list of all the words in the hash table (\c .i .idx and .i .iov files). Each leaf contains up to four words, and a pointer to the next leaf in alphabetical order. With each word is stored a pointer into the hash table where that word is found. .lp The .i .bdx file has records (called .i nodes ) that contain one four-byte key, and two pointers; one to the previous node in alphabetical order, and one to the following node in alphabetical order. If a particular four-byte key happens to begin a leaf, that key's node will contain a pointer to that leaf instead of a pointer to another node. .uh "Implementation \(em Queries" .lp An incoming query is first broken down into its component parts. Then, the selection arguments of the query are checked for indexed arguments. The longest indexed arguments\** .(f \** Actually, the longest indexed arguments free of pattern-matching metacharacters. Pattern matches take much longer than normal index lookups since the .i .bdx and .i .seq files must be searched, and since such searches frequently result in a large number of matches being selected. .)f are looked up one by one in the hash table (or, if they contain pattern-matching characters, a search is made through the .i .bdx and .i .seq files for each pattern). The index entries are .q anded together to select only those entries that contain all of the indexed words. .lp Next, the selected entries are fetched one by one, and matched against the argument list. This is done for two reasons. First, the fact that an entry appears in the index for a word says nothing about which .b field the word is to be found in; it merely notes that the word does appear. Therefore, it is necessary to recheck indexed fields, and make sure the words in question appear in the proper fields. Second, the non-indexed words must be checked, to see that they appear in the proper fields in the entry. .lp If the entry passes the checks, the selected fields (or a set of default fields) are printed. .uh "Implementation \(em Field Descriptions" .lp Field descriptions are kept in a file that .i qi reads each time it is run. This file consists of lines describing each field, in .sm ASCII , with colons separating the elements in a line. First comes the id number of the field, then the name of the field and its maximum length. Finally, there is a colon-separated list of properties for the field. .lp Since this file is read each time .i qi starts up, lines can be added to define new fields at will. All subsequent invocations of .i qi will be able to recognize and use the fields. .lp The major properties fields may have are Indexed, Public, Default, Lookup, and Change. Fields marked Indexed are kept track of in the database's index. At least one such field .b must be included in every query. Fields marked Public may be viewed by anyone using .i qi in anonymous or login mode. Fields not marked Public may only be viewed by the entry's owner in login mode, or by someone using .i qi in hero mode. Default fields are printed if no .q return clause is given in a query. Lookup fields may be used in the selection part of a query; a field not marked Lookup cannot be used to select entries.\** .(f \** You might decide, for example, that no one should be allowed to be found by his or her phone number. You could mark the phone number field as Public (so it could be viewed) but not Lookup (so no one could use it in searches). .)f Finally, a user in login mode in .i qi may change any of his or her fields that are marked Change. .uh "Performance \(em Database Size" .lp Our database contains 80,140 entries, totalling 16 megabytes of information. The .i .dir and .i .dov files together are 33 megabytes; nearly half the space is wasted. This percentage could be reduced by reducing the record size of the .i .dir file. .lp The hash table, which has room for 450,001 words, actually contains 157,324 words and 270,784 pointers, for a total of 1.3 megabytes of hash table. The .i .idx and .i .iov files are 19.5 megabytes in size; even allowing for a large number of empty hash table slots (necessary for performance), most of the space is wasted. As with the .i .dir file, reducing the record size in the .i .idx file would help the situation. .lp Rounding out the database is 7.2 megabytes in the .i .bdx and .i .seq files. .uh "Performance \(em Speed" .lp To test speed, we took 300 words from different parts of the index, and looked each one up using .i qi . .i Qi found 396 entries in 78 seconds; that is about \(14 second per lookup. Using four letter keys and wildcarding the rest, .i qi found 9213 entries in 460 seconds, for about 1\(12 seconds per lookup. .lp In actual use over a network, response is slower, since the client program must establish a connection with the host that has the database. Looking up 100 indexed words in separate invocations of .i ph took 109 seconds, or 1 second per lookup; 118 entries were found. .uh "Performance \(em Usage" .lp In a recent week, typical of most weeks, we had 3100 uses from over 70 campus machines.\** .(f \** It is impossible to get an exact count of the number of machines, since there are some machines that use another computer as a relay; these machines do not show up in the count. .)f By far most of the commands given were queries (3643). There were also 175 logins, 264 changes, and a few hundred other commands issued. Of the commands: 58% were successful; 26% were queries that found no entries; 8% were queries that found too many entries; 4% were other errors; 3% were rejected because they required login mode, but were being given in anonymous mode; and 1% failed due to command syntax errors. .uh "Further Directions" .lp Overall, we are fairly satisfied with what has been done to date. Ongoing efforts will be centered around making the Nameserver convenient to use in a distributed environment. This will primarily involve allowing users to specify a server, although some peripheral issues are also in need of resolution. .lp Additionally, we will make some attempts to remove wasted space from the database and its associated index; this is not a high priority since the database, for all its wasted space, is still not unmanageably large. .uh Distribution .lp The CCSO Nameserver is Copyright \(co 1988 by the University of Illinois Board of Trustees. Portions of the software are Copyright \(co 1985 by CSNet. It is distributed free of charge, and is available for anonymous ftp from uxc.cso.uiuc.edu, in the net/qi subdirectory as well as the pub/qi.tar.Z file. The client software for UNIX and VMS is available on the same computer, in the net/ph subdirectory and in the file pub/ph.tar.Z. No support will be provided by the University, and the University is not liable for anything bad that happens as a result of its use. The software may not be redistributed without permission from CSNet. .uh References .lp .i "UNIX Manual Pages" . Manual pages are available on .i ph (1) and .i qi (8). .lp .i "The CCSO Nameserver, An Introduction" by Steven Dorner. A brief introduction geared at a new user of .i ph . .lp .i "The CCSO Nameserver, Why" by Steven Dorner. A recap of the design decisions that made our Nameserver what it is, including evaluations of some similar systems available when our Nameserver was designed. .lp .i "The CCSO Nameserver, Server\-Client Protocol" by Steven Dorner. Full documentation of the language used between the Nameserver server program, .i qi , and the outside world. .lp .i "The CCSO Nameserver, Guide to Installation" by Steven Dorner. How to install the programs that make up the CCSO Nameserver. .lp .i "The CCSO Nameserver, A Programmer's Guide" by Steven Dorner. In depth documentation for anyone maintaining or wishing to completely understand the CCSO Nameserver. .lp .i "Rebuilding a Nameserver Database In 24 Easy Steps" by Steven Dorner. Describes how we build a database, beginning with raw data we receive from our Administrative branch. .uh Acknowledgement .lp Our Nameserver is very similar in function and philosophy the CSNet nameserver. In fact, the database management code from that nameserver, with some modification, is used in our Nameserver. We are grateful to CSNet that their program was made available to us.