.\" Process with "pic description.me | troff -me | whatever" or more simply
.\"		 "groff -p -me description.me | lpr -Pps"
.\" @(#)$Id: description.me,v 1.2 1992/08/17 13:34:11 paul Exp $
.if n .na
.if n .ll 6.5i
.(l C
.sz 18
.ft B
The CCSO Nameserver \- A Description
.sp
.sz
.ft
by
Steven Dorner\ \ \ s\-dorner@uiuc.edu
Computer and Communications Services Office
University of Illinois at Urbana
.sp
July 26, 1989
.sp 2
updated by
Paul Pomes\ \ \ paul\-pomes@uiuc.edu
Computer and Communications Services Office
University of Illinois at Urbana
.sp
August 2, 1992
.(f
Converted to portable n/troff format using the -me macros from funky
Next WriteNow format (icch).
.)f
.)l
.eh '%''The CCSO Nameserver \- A Description'
.oh 'The CCSO Nameserver \- A Description''%'
.sp 3
.uh Introduction
.lp
This document provides an overview of the CCSO Nameserver.
It should give the reader a good idea of the capabilities,
implementation and performance of the Nameserver.
.uh Overview
.lp
The CCSO Nameserver is a computer resident
.q "phone book" .
It can keep a relatively small amount of information about a
relatively large number of people or things,
and provide fast access to that information over the Internet.\**
.(f
\** The collection of local, regional, and national networks using the
TCP/IP protocols.
.)f
Here at the University of Illinois,
we keep the contents of the
.q "white pages"
of our
.i "Student/Staff Directory"
as well as other selected information,
in the Nameserver.
.lp
Unlike a printed directory,
the information in the CCSO Nameserver is dynamic.
It can be updated at any time,
from any computer on the Internet capable of running the
.q client
program,
.i ph .\**
.(f
\** At present this means 4.[23] BSD UNIX, VMS, VM/CMS, DOS, or Macintosh.
.)f
The Nameserver can also be taught to keep new
.b types
of information, such as electronic mail addresses or office hours,
without recompilation or change to the existing database.
.lp
The remainder of this document will examine in somewhat further
depth three aspects of the Nameserver;
what it does (\fBCapabilities\fP), how it does them (\fBImplementation\fP),
and how well it does them (\fBPerformance\fP).
There are in\-depth documents describing some of these
aspects of the Nameserver;
the interested reader may refer to the
.b References
section for the titles of these other documents.
.uh "Capabilities \(em The Database"
.lp
The CCSO Nameserver manages a database that consists of many individual
.i entries .
Each entry contains one or more
.i fields ,
each field consisting of a one or more printable
.sm ASCII
characters (including tab and newline).
Each field is associated with a particular
.i "field description"
that is used to specify the behavior of the field.
A field description includes a name,
a maximum length for the fields it describes, and certain
.i properties
that determine how the field is used.
.lp
There are essentially no intrinsic limits on the size of the database,
in number of entries,
numbers of field descriptions,
numbers of fields per entry,
or sizes of fields.\**
.(f
\** Actually there are limits imposed by the 32-bit pointers used
throughout the system.
Before those limits could be reached, however, the database would likely
be too large to manage.
.)f
.lp
Certain fields\**
.(f
\** Those whose field descriptions in the
.i .cnf
file contain the property
.q Indexed .
.)f
in the database are indexed.
Words from these fields can be used as keys to select entries in the database.
Words from any field may be used to refine the selection made by the key fields.
The indexing scheme used is
.q double\-hashing ,
and results in very fast lookups for key fields.
The hash table is also indexed to facilitate pattern matching on the hash table
(and hence the database).
.uh "Capabilities \(em The Server"
.lp
The database resides entirely on one computer and
is managed by a server program,
.i qi
(query interpreter).
Multiple instances of
.i qi
may be executing at any one time;
access to the database is controlled by advisory locks.
Any number of processes may read the database,
unless a process is writing the database,
in which case all processes must wait for that process
to complete its work before beginning their own.
.lp
.i Qi
uses a command\-reply scheme like that used by FTP.\**
.(f
\** See RFC-959,
.i "File Transfer Protocol (FTP)" ,
J. Postel and J. Reynolds.
.)f
It accepts commands from its standard input,
and writes replies on its standard output.
Both commands and replies are couched in
.q netascii ;
lines consisting of printable
.sm ASCII
characters terminated with a newline (\c
.sm ASCII
10) or carriage\-return newline (\c
.sm ASCII
13
.sm ASCII
10) pair.
Additionally, the backslash
.q \e
is used to
.q escape
certain characters, as in the C programming language.\**
.(f
\** The set of such escapes is much more limited than in C; only
.q \en
for newline,
.q \et
for tab,
.q \e"
for double-quote,
and
.q \e\e
for backslash are allowed.
.)f
.lp
Commands consist of a keyword optionally followed by one
or more arguments or keywords.
Commands include:
.b query
for querying the database;
.b change
for changing fields in entries;
.b add
for adding new entries.
Replies consist of a numerical code ranging from \-599 to 599,
and additional text.
The numerical codes may indicate an operation in progress (100\-199),
success (200\-299),
a request for further information (300\-399),
temporary failure (400\-499),
or permanent failure (500\-599).
Replies in the range from \-599 to \-100 indicate that further
replies are to be expected for the current command;
they otherwise have the same meanings as their positive counterparts.
.lp
The behavior of
.i qi
may be modified by use of certain
.i options ,
accessed by the
.b set
command.
The number of available options is small;
the most important options are
.i echo ,
which causes
.i qi
to print commands on its output before executing them, and
.i limit ,
which allows the user to specify a maximum number of entries
to which a command may be applied.
.lp
.i Qi
operates in three different modes;
anonymous, login, and hero.
Each mode is more liberal than the previous in the operations it allows,
and consequently more difficult to access.
Anonymous mode is used to make queries of public information\**
.(f
\** I.e., to view fields whose field description contain the property
.q Public .
.)f
and for a few other innocuous purposes.
In anonymous mode,
there is a maximum number of entries that can be viewed with one command;
the purpose of this limitation is to discourage the use of the Nameserver
for the preparation of mailing lists.
Anonymous mode is used for most queries of the Nameserver.
.lp
To enter login mode,
a user must identify himself as the owner of a particular Nameserver entry
by giving an
.i alias
(login name) and a password.\**
.(f
\** Actually the user is asked to encrypt a string using his password, and
.i qi
compares the result returned with the result it obtained by encrypting the
same string with the user's stored password.
This is to provide additional security when running
.i qi
over networks;
the user's password is never sent
.q "in the clear"
over a potentially insecure network.
.)f
In addition to the capabilities of anonymous mode,
login mode allows the logged in user to change fields from his
or her own entry in the Nameserver.\**
.(f
\** Actually a user may change only those fields whose field description
contain the property
.q Change .
.)f
.lp
Hero mode is entered either by entering login mode as a Nameserver
.q hero
(superuser) or by running
.i qi
directly from a terminal, rather than over a network.
In this mode, all artificial limits are removed;
the hero may change any field in any entry in the database,
as well as view as many entries as he wishes.
Hero mode is used mostly for administrative purposes.
.uh "Capabilities \(em Queries"
.lp
Since most of what the Nameserver does is answer queries,
it is fitting to describe queries more fully here.
A nameserver query consists of five elements;
the
.q query
keyword, values for one or more indexed fields,
values for zero or more non\-indexed fields,
optionally the
.q return
keyword, and optionally a list of fields to print from the selected entries.
A couple of examples will clarify.
First, a plain query;
the arguments are interpreted as requests for words
from the name or nickname fields,
both of them indexed fields:
.(b
\f(CRqi>\fP \f(CBquery steven dorner\fP
.ft CR
\-200:1:         alias: s\-dorner
\-200:1:          name: dorner steven c.
\-200:1:         email: dorner@garcon.cso.uiuc.edu
\-200:1:         phone: (w) 244\-1765
\-200:1:       address: 181 DCL, MC 256
\-200:1:              : 1201 W. Washington, C, 61821
\-200:1:         title: res programmer
\-200:1:      nickname: Steve
\-200:1:         hours: 8\-4 weekdays
200:Ok.
.ft
.)b
.lp
Here is an example that uses all five elements.
The
.q department
field is not indexed.
.(b
\f(CRqi>\fP \f(CBquery dorner department=computing return name email department\fP
.ft CR
\-200:1:          name: dorner steven c.
\-200:1:         email: dorner@garcon.cso.uiuc.edu
\-200:1:    department: computing services office
200:Ok.
.ft
.)b
.uh "Capabilities \(em The Client"
.lp
Usually, the Nameserver is accessed via the
.q client
program
.i ph .
This program makes a connection to a copy of
.i qi
on the machine that keeps the Nameserver database.
It then provides assistance to the user of the Nameserver;
it formulates queries,
formats Nameserver responses,
and provides other conveniences.
.lp
.i Ph
operates in two modes;
command-line and interactive.
In command-line mode,
.i ph
forms a Nameserver query from the arguments given it,
sends it to
.i qi ,
prints the result, and exits.
In interactive mode,
.i ph
reads commands from the user, relays them to
.i qi ,
and prints
.i qi 's
responses.
The responses are automatically sent through a paging program.
Some commands given to ph are expanded into more than one qi command.
For example, the
.i ph
.q edit
command first asks
.i qi
for the value of the desired field,
puts that value in an editor where the user edits it as s/he pleases,
and then issues a
.q change
command to change the field to its desired new value.
.uh "Implementation \(em The Source"
.lp
The Nameserver is written in C (a small parser is written in lex\**),
.(f
\** See
.i "Lex\-A Lexical Analyzer Generator" ,
M.E. Lesk and E. Schmidt.
.)f
and runs on UNIX systems.
The client,
.i ph ,
may be run on 4.[23]BSD derived systems.
A version of
.i ph
exists for VMS, DOS, Mac, and a limited version exists for VM/CMS systems.
.lp
There were at last count 320,000 bytes of C and lex source code;
some 6,000 statements in 63 files.
This source is divided into several distinct categories;
.i qi
(230,000 bytes, 28 files, 3500 statements),
.i ph
(46,000 bytes, 3 files, 700 statements),
utilities (89,000 bytes, 21 files, 1700 statements),
and libraries (19,500 bytes, 11 files, 300 statements).
.lp
The database and
.i qi
reside on a Digital Equipment Corporation VAXServer 3500 running Ultrix.
.uh "Implementation \(em The Database"
.lp
The database is kept in six files with the extensions
.i .dir ,
.i .dov ,
.i .idx ,
.i .iov ,
.i .seq ,
and
.i .bdx .
The
.i .dir
and
.i .dov
files contain the actual data.
The
.i .idx
and
.i .iov
files contain the hash table, with pointers into the data files.
The
.i .seq
file contains all the words from the hash table, sorted alphabetically,
along with pointers into the hash table;
it is used for pattern-matching on the hash table.
The
.i .bdx
file contains a tree of four-letter nodes,
each node pointing to where entries with those four letters begin in the
.i .seq
file;
the
.i .bdx
file speeds search of the
.i .seq
file.
.(z
.ie n \{\
.(c
Nroff version of Figure 1 not available.
.)c

.\}
.el .ie !"\*(.T"" \
\{\
.PS
boxht = .3; boxwid = .6
define blk {
	$1: [
		down
		A: box $2
		B: box
		C: box
		D: box
		E: box ht 2*boxht "." "."
	]
}
right
blk(Bdx, ".bdx")
move
blk(Seq, ".seq")
move
blk(Idx, ".idx")
move
blk(Iov, ".iov")
move
blk(Dir, ".dir")
move
blk(Dov, ".dov")
arrow from Bdx.B to Seq.C.w
arrow from Bdx.C to Seq.E.w
arrow from Seq.C to Idx.B.w
arrow from Seq.C to Idx.D.w
arrow from Seq.D to Idx.E.w
arrow from Idx.B to Iov.B.w
arrow from Idx.D to Iov.C.w
arrow from Iov.B to Dir.C.w
arrow from Iov.B to Dir.D.w
arrow from Iov.D to Dir.C.w
arrow from Iov.E to Dir.E.w
arrow from Dir.C to Dov.E.w
arrow from Dir.E to Dov.B.w
.PE
.\}
.el .sp 2i
.ce
.b "Figure 1.  Database Organization"
.)z
.lp
The
.i .dir
file consists of a header and one fixed-length record for each
entry in the database.
If there is too much data for one record,
the remainder is placed in the
.i .dov
file.
The
.i .dov
file also consists of fixed-length records,
and if one is not enough,
the remainder can be placed in more
.i .dov
records.
Thus, an entry is really a linked list of fixed-length records,
and is not limited in size.
It is relatively easy to play with the sizes of the
.i .dir
and
.i .dov
records (before compilation and installation of the database)
for optimum performance.
We use a fairly small record size in the
.i .dir
file, to minimize space wastage,\**
.(f
\** Not entirely successfully \- see
.b "Performance \(em Database Size"
below.
.)f
and a fairly large record size in the
.i .dov
file, to minimize linking.
Most entries are wholly contained in the
.i .dir
file; most of the rest require only one
.i .dov
record.
.lp
Each entry begins with some fixed-length information,
followed by the fields that make up its data.
Each field is a null-terminated
.sm ASCII
string.
A field begins with an
.sm ASCII
string that is the id of the
field description for that field, and a colon.
The field's data follows,
and then the null terminator (\c
.sm ASCII
0).
Tagging each field with its description number means that
the database is not sensitive to the presence,
absence or order of the fields.
This in turn means that field
descriptions can be added to the Nameserver at will,
and the newly-defined fields used,
without recompilation or rebuilding of the database (see
.b "Implementation \(em Field Descriptions"
below).
.lp
The
.i .idx
file is made up of a fixed number of fixed-length records.
Each record that is in use contains a word from an indexed field,
and a set of pointers to the
.i .dir
records that contain the word in an indexed field.
Overflow in the
.i .idx
file is handled like overflow in the
.i .dir
file; the excess pointers are put in one or more fixed-length records in the
.i .iov
file.
Words are indexed by computing a hash function.
If the selected location is not empty but does not contain the desired word,
the hash function is iterated,
until a limit is reached (implying the failure of the index)
or the word or an empty spot is found.
If the spot is empty,
the word and a pointer to the entry in which it occurs is placed in the record.
If the spot is not empty,
a pointer to the entry is appended to the list of pointers for that word.
.lp
The
.i .seq
file uses fixed-length records (called
.i leaves )
to keep a sorted list of all the words in the hash table (\c
.i .idx
and
.i .iov
files).
Each leaf contains up to four words,
and a pointer to the next leaf in alphabetical order.
With each word is stored a pointer into the hash table where that word is found.
.lp
The
.i .bdx
file has records (called
.i nodes )
that contain one four-byte key, and two pointers;
one to the previous node in alphabetical order,
and one to the following node in alphabetical order.
If a particular four-byte key happens to begin a leaf,
that key's node will contain a pointer to that leaf
instead of a pointer to another node.
.uh "Implementation \(em Queries"
.lp
An incoming query is first broken down into its component parts.
Then, the selection arguments of the query are checked for indexed arguments.
The longest indexed arguments\**
.(f
\** Actually, the longest indexed arguments free of pattern-matching
metacharacters.
Pattern matches take much longer than normal index lookups since the
.i .bdx
and
.i .seq
files must be searched, and since such searches frequently result in
a large number of matches being selected.
.)f
are looked up one by one in the hash table
(or, if they contain pattern-matching characters,
a search is made through the
.i .bdx
and
.i .seq
files for each pattern).
The index entries are
.q anded
together to select only those entries that contain all of the indexed words.
.lp
Next, the selected entries are fetched one by one,
and matched against the argument list.
This is done for two reasons.
First, the fact that an entry appears in the index for a word says nothing
about which
.b field
the word is to be found in; it merely notes that the word does appear.
Therefore, it is necessary to recheck indexed fields,
and make sure the words in question appear in the proper fields.
Second, the non-indexed words must be checked,
to see that they appear in the proper fields in the entry.
.lp
If the entry passes the checks,
the selected fields (or a set of default fields) are printed.
.uh "Implementation \(em Field Descriptions"
.lp
Field descriptions are kept in a file that
.i qi
reads each time it is run.
This file consists of lines describing each field, in
.sm ASCII ,
with colons separating the elements in a line.
First comes the id number of the field,
then the name of the field and its maximum length.
Finally, there is a colon-separated list of properties for the field.
.lp
Since this file is read each time
.i qi
starts up, lines can be added to define new fields at will.
All subsequent invocations of
.i qi
will be able to recognize and use the fields.
.lp
The major properties fields may have are Indexed, Public, Default,
Lookup, and Change.
Fields marked Indexed are kept track of in the database's index.
At least one such field
.b must
be included in every query.
Fields marked Public may be viewed by anyone using
.i qi
in anonymous or login mode.
Fields not marked Public may only be viewed by the entry's owner in login mode,
or by someone using
.i qi
in hero mode.
Default fields are printed if no
.q return
clause is given in a query.
Lookup fields may be used in the selection part of a query;
a field not marked Lookup cannot be used to select entries.\**
.(f
\** You might decide, for example, that no one should be allowed to be found
by his or her phone number.
You could mark the phone number field as Public (so it could be viewed)
but not Lookup (so no one could use it in searches).
.)f
Finally, a user in login mode in
.i qi
may change any of his or her fields that are marked Change.
.uh "Performance \(em Database Size"
.lp
Our database contains 80,140 entries,
totalling 16 megabytes of information.
The
.i .dir
and
.i .dov
files together are 33 megabytes;
nearly half the space is wasted.
This percentage could be reduced by reducing the record size of the
.i .dir
file.
.lp
The hash table,
which has room for 450,001 words,
actually contains 157,324 words and 270,784 pointers,
for a total of 1.3 megabytes of hash table.
The
.i .idx
and
.i .iov
files are 19.5 megabytes in size;
even allowing for a large number of empty hash table slots
(necessary for performance),
most of the space is wasted.
As with the
.i .dir
file, reducing the record size in the
.i .idx
file would help the situation.
.lp
Rounding out the database is 7.2 megabytes in the
.i .bdx
and
.i .seq
files.
.uh "Performance \(em Speed"
.lp
To test speed, we took 300 words from different parts of the index,
and looked each one up using
.i qi .
.i Qi
found 396 entries in 78 seconds; that is about \(14 second per lookup.
Using four letter keys and wildcarding the rest,
.i qi
found 9213 entries in 460 seconds,
for about 1\(12 seconds per lookup.
.lp
In actual use over a network, response is slower,
since the client program must establish a connection
with the host that has the database.
Looking up 100 indexed words in separate invocations of
.i ph
took 109 seconds, or 1 second per lookup; 118 entries were found.
.uh "Performance \(em Usage"
.lp
In a recent week, typical of most weeks,
we had 3100 uses from over 70 campus machines.\**
.(f
\** It is impossible to get an exact count of the number of machines,
since there are some machines that use another computer as a relay;
these machines do not show up in the count.
.)f
By far most of the commands given were queries (3643).
There were also 175 logins, 264 changes,
and a few hundred other commands issued.
Of the commands: 58% were successful;
26% were queries that found no entries;
8% were queries that found too many entries;
4% were other errors;
3% were rejected because they required login mode,
but were being given in anonymous mode;
and 1% failed due to command syntax errors.
.uh "Further Directions"
.lp
Overall, we are fairly satisfied with what has been done to date.
Ongoing efforts will be centered around making the Nameserver
convenient to use in a distributed environment.
This will primarily involve allowing users to specify a server,
although some peripheral issues are also in need of resolution.
.lp
Additionally, we will make some attempts to remove wasted space
from the database and its associated index;
this is not a high priority since the database,
for all its wasted space, is still not unmanageably large.
.uh Distribution
.lp
The CCSO Nameserver is Copyright \(co 1988 by the University of
Illinois Board of Trustees.
Portions of the software are Copyright \(co 1985 by CSNet.
It is distributed free of charge,
and is available for anonymous ftp from uxc.cso.uiuc.edu,
in the net/qi subdirectory as well as the pub/qi.tar.Z file.
The client software for UNIX and VMS is available on the same computer,
in the net/ph subdirectory and in the file pub/ph.tar.Z.
No support will be provided by the University,
and the University is not liable for anything bad that happens as a result
of its use.
The software may not be redistributed without permission from CSNet.
.uh References
.lp
.i "UNIX Manual Pages" .
Manual pages are available on
.i ph (1)
and
.i qi (8).
.lp
.i "The CCSO Nameserver, An Introduction"
by Steven Dorner.
A brief introduction geared at a new user of
.i ph .
.lp
.i "The CCSO Nameserver, Why"
by Steven Dorner.
A recap of the design decisions that made our Nameserver what it is,
including evaluations of some similar systems available when our Nameserver
was designed.
.lp
.i "The CCSO Nameserver, Server\-Client Protocol"
by Steven Dorner.
Full documentation of the language used between the Nameserver server program,
.i qi ,
and the outside world.
.lp
.i "The CCSO Nameserver, Guide to Installation"
by Steven Dorner.
How to install the programs that make up the CCSO Nameserver.
.lp
.i "The CCSO Nameserver, A Programmer's Guide"
by Steven Dorner.
In depth documentation for anyone maintaining or wishing to
completely understand the CCSO Nameserver.
.lp
.i "Rebuilding a Nameserver Database In 24 Easy Steps"
by Steven Dorner.
Describes how we build a database,
beginning with raw data we receive from our Administrative branch.
.uh Acknowledgement
.lp
Our Nameserver is very similar in function and philosophy the CSNet nameserver.
In fact, the database management code from that nameserver,
with some modification, is used in our Nameserver.
We are grateful to CSNet that their program was made available to us.