The Hyper-G Information System Klaus Schmaranz (Institute for Information Processing and Computer Supported New Media (IICM), Graz University of Technology, Austria (kschmar@iicm.tu-graz.ac.at) abstract In this paper a brief description of Hyper-G - the first second generation hypermedia system - is given. The first section identifies some problems with first generation hypermedia systems. The following sections are an outline of the new concepts that are implemented in Hyper-G. These new concepts such as a world-wide distributed network database, a separated link database and bidirectional links allow highly sophisticated navigation, hyperlinks in all native document types such as hypertext, images, PostScript documents and even in movies, sound and 3D scenes. Hyper-G also implements very powerful search mechanisms such as boolean search on titles, keywords, even fulltext with user defined scope from one collection on one server to all servers worldwide. And last but not least Hyper-G is fully compatible to first generation systems like Gopher and WWW. Introduction Currently, the most popular Internet information systems use distributed menus and searching (Gopher, see [Alberti et al 1992]) or hypertext documents (WWW, see [Berners-Lee et al 1994]) to represent information spaces. In both cases information is stored in a simple structured fashion on servers, and can be accessed via clients, with clients available for most major hardware platforms. Both systems provide an easy to use interface and are easy to install for service providers as well as for readers, but both systems also have some well-known weaknesses: In both systems the knowledge of certain server addresses is required for browsing. In WWW Hyperlinks as the only structuring tool (Robert Cailliau: ``the world wide spaghetti bowl'') lead to the ``lost in hyperspace'' syndrom: after following a number of links it is nearly impossible to find the way to an interesting page again the next time a reader searches it. Furthermore there is no way to decide how much of the information has already been seen because the hierarchical structure is missing. Hyperlinks are made up using URLs (Universal Resource Locators) - they point to the location of a document. If a document moves or disappears the hyperlinks point to nowhere. It is desirable to make up hyperlinks using URNs (Universal Resource Names) that identify the document itself instead of the location. Both Gopher and WWW servers are difficult to administrate as the number of documents grows because there is no underlying database. Search mechanisms, depending on the search engine, are mostly limited to a simple Title search. Both Gopher and WWW are lacking user identification and billing mechanisms. Hyper-G [see Maurer 1994], [Andrews et al 1994] as the first second generation hypermedia system tries to overcome the weaknesses mentioned above using a completely new underlying concept [see Kappe et al 1993]. One of the major design goals of Hyper-G was it to be fully compatible to Gopher and WWW and therefore to fit perfectly into the existing Internet landscape. Naturally the modern design of Hyper-G allows to add many new sophisticated navigation, editing and authoring features as you will read in the following sections. With some lack of the special Hyper-G functionality Hyper-G servers can naturally be accessed with all well-known Gopher and WWW clients. The Hyper-G Database Hyper-G is based on a client-server architecture with servers available for all major Unix platforms and clients available for Unix and MS-Windows platforms. A native Macintosh client is under development. Naturally one can also use all well-known Gopher and WWW clients to access a Hyper-G server. The client-server protocol of Hyper-G is a connection oriented protocol instead of a connectionless protocol as implemented in Gopher or WWW. This allows it to implement user access rights and billing mechanisms in Hyper-G. The Hyper-G server itself is based on a distributed network database architecture with separated database engines for documents and hyperlinks. Using this concept the whole world of Hyper-G servers presents itself to the user as one large single database with no need to know server addresses to access single servers. The server structure is completely hidden to the user. Each document in Hyper-G has an additional Object description where all the important Attributes of a document are stored. Attributes include title, keywords, type, creation time, last modification time and also some very special attributes like the expiry date of a document if any. Expiry dates are very important for quality assurance - who has not already seen a call for papers for a conference that was months ago! In Hyper-G a document is no longer accessible after the expiry date and also all the links pointing to it are invisible. Object attributes are searchable allowing very sophisticated searches like ``find all documents that are written by a special author and not older than half a year'' or ``find all text or PostScript documents that have the term multimedia in their title or keywords''. All Objects in Hyper-G have a world-wide unique Object ID that allows to locate them even across server boundaries. Due to the separate link database all links in Hyper-G are bidirectional. Visiting a document a reader can not only keep track of all the links pointing from this document to the outside world but also can have a look at all links pointing from outside to this document. The second and much more important feature of bidirectional links is that they allow to keep the link structure consistent! If a document is moved from one location to another, even across server boundaries the link is pointing to the document at its new location. Hyperlinks in Hyper-G are based on the concept of URNs (Universal Resource Names) instead of URLs (Universal Resource Locators). Another very important feature of the separated link database is the possibility to add links to all supported document types, even hyperlinks to and from images, PostScript documents, movies, 3D scenes and sound are supported! Since the links are stored outside the document there is no need to change format specifications of standard formats to make them hyperlinkable. User Accounts and Billing Handling user accounts in Hyper-G is done similar to the way Unix handles them: each user can belong to one or more groups and each user has a personal home-collection where he can store his personal documents or references to existing documents. Access to documents is then controlled by read, write, and unlink permissions. Billing mechanisms in Hyper-G are at the moment implemented following 2 different paradigms: The first method is to give the user a certain amount of ``money'' and to give charged documents a certain price. Each time a charged document is downloaded the price of the document is subtracted from the actual funds of the user. In this case the user pays in advance. The second method is based on the library paradigm and mostly used for electronic publishing: The library that operates the server subscribes for a certain number of exemplars of a journal, e.g. 3 exemplars. Users have access to the journals on the library server, depending on their access rights, in the case of libraries normally also anonymous users have access. But for e.g. 3 exemplars one document is only accessible by a maximum of 3 simultaneous users, and after a user has ``borrowed'' the document it is not accessible for a given time, e.g. 1 hour. If more than the maximum allowed number of users wants to get access the users get a ``at the moment not available'' message. Structuring of data Structuring of documents in Hyper-G is done using a collection hierarchy very similar to Gopher's file hierarchy. Additional structuring tools are Hyperlinks as known from WWW and a special clustering mechanism. Documents can be members of arbitrary many collections, even across server boundaries, in this case they are represented as references in the collection tree. Collections can naturally not only hold native Hyper-G objects but also references to Gopher and WWW documents, telnet objects and many more. Also completely exotic document types can be referenced by a special generic object that holds the object description and additional information, e.g. which viewer to start for this type of object. Clusters in Hyper-G are several documents that are grouped together for logical reasons. One example of a logical cluster would be an image and a text describing it or a movie and additional sound to the movie. When clicking on a cluster the reader gets all documents contained, in case of the movie and sound example the movie player will be started and the sound played simultaneously. One even more important feature of clustering is the support of multilinguality. One can store several documents in different languages in a cluster, even togegther with other arbitrary documents. Hyper-G users have the possibility to choose their favourite language and in case of a multilingual cluster will get the document in their preferred language if it exists. One could for example store an image together with an English and a German description in a clusters and users that have chosen English as their preferred language would then get the image with the English description. Caching and Replication Due to the exponential growth of Internet and of information stored in Internet scalability of Hyper-G was an important design goal. During rush hours data transfer over the Internet is unacceptably slow, especially for overseas connections. Due to this fact caching and mirroring mechanisms had to be implemented. Caching in Hyper-G is implemented using well-known principles, but there is another specialized mechanism used for mirroring data between servers - replication: Documents already existing on one server can be exported from the database and mirrored to other servers. In this case they should obtain a special replica-attribute to let the server know that they are mirrored objects from another server. The server then evaluates the replica-attribute for remote objects as follows: Let us assume following scenario: A user in the USA starts his Hyper-G client and connects to his nearest Hyper-G server. Since the Hyper-G database seamlessly hides server boundaries this user surfs a little bit in hyperspace until he finds an interesting document in Austria. He decides to get this document. Normally all data would be transferred from Austria to the USA. But this document is for example a paper of an electronic journal and the journal is completely mirrored to the server in the USA - only the user did not know. In this case the Hyper-G server in the USA automatically detects that it has stored a replica of the document of Austria and sends the user the local replica instead of unnecessarily downloading the document from Austria. Native Document Types Hyper-G supports a number of native document types as well as external documents such as references to Gopher and WWW documents and generic objects. Native document types are at the moment: Images of many different formats MPEG movies PostScript documents Sound files 3D scenes. All of the above formats are fully hyperlinkable - in the case of MPEG movies the clickable areas of hyperlinks even can move and change their size and shape in a hyperlinked film. PostScript documents are often used for archive material that is not in hypertext format or also very often for specialized text, e.g. mathematical papers containing lots of formulae that cannot be handled very well by simple hypertext formats. Full support of hyperlinks in PostScript adds the functionality that one normally knows from hypertext. To add full hypertext functionality a method to do full text search in PostScript files is under development and will be available soon. Also an advanced PDF viewer similar to Adobe Acrobat is under development. But, Hyperlinkable does not only mean that one can define clickable areas in documents, so called source anchors. One can define destination anchors such as a special sequence in a soundfile. One could for example have a text document with the text of a song and if one clicks on a certain text line the sound player jumps to that part and plays the music that belongs to this line. Of special interest especially for the future when 3D hardware gets cheaper and computing power increases even for the low end computers is the 3D support. Hyper-G at the moment supports 3D scenes in WaveFront format naturally again with full hyperlink support. Together with the University of Minnesota and NCSA full VRML support is under development, the first prototype of this new 3D viewer generation is already implemented in Harmony - the Unix Hyper-G client. Navigation As has already been said Hyper-G presents itself to the user as a single structured network database. The representation of this network database in Hyper-G clients is normally a directory-tree like structure of collections. In Harmony, the Unix Hyper-G client, also a second representation of the collection hierarchy is implemented - the representation as a 3D information landscape. Collections and documents are represented as buildings in a landscape and the user can fly through the landscape. Navigation also includes very advanced features such as the local map. Based on the bidirectional nature of links one can create a local map on the fly showing arbitrary levels of incoming and outgoing links. This is very interesting if one finds an interesting document; in most cases other interesting documents on the same topic can be found by following the links to and from this document. Very important for navigation in Hyperspace is searching. Hyper-G supports boolean title-, keyword- and fulltext-search with user definable scope from one single collection on one single server to all collections on all servers worldwide. But what if the search finds several hundreds of documents? For this case there is a special feature in the clients, the so called ``location feedback'' - a simple click on a document in the search result window opens the path to the document. So the user can decide if this document is of interest for him. Let's take for example a simple search for the item ``grep''. The user wanted to know the exact syntax and therefore searches for a manpage on that topic. The search could result in 2 search hits, one in the hacker's jargon and the other one in the manpages. A single click on the first object immediately shows that this hit is in the collection ``hacker's jargon'' and is therefore not interesting for the user without forcing the user to download the document first. Editing and Authoring with Hyper-G As already mentioned Hyper-G is not a read-only system where users can easily view documents, Hyper-G databases allow read and write access according to the user access rights. Documents can easily be copied or moved from one collection to another, even across server boundaries and naturally documents can also be edited. Hyperlinks in documents are created very simply using the mouse. To control the presentation of collections in Hyper-G the author can add Attributes like presentation hints, sort order, sequence number and some more to documents. Also available for Hyper-G are tools dealing with mass data such as inserting, deleting, changing or replacing huge numbers of documents in a single batch job as well as tools for mirroring data on other servers. Conclusion Hyper-G is designed to manage huge amounts of information and provide mechanisms making it easy for the information provider to administrate a server as well as it is designed to make the Internet surfer's life easy. Integrated sophisticated search mechanisms as well as new navigation facilities such as a 3D landscape view of the database strongly ease data retrieval. Hyper-G is a first implementation of the second generation of hypermedia systems whose features nobody will want to miss in the future. [Alberti et al 1992]{} Alberti,B., Anklesaria,F., Lindnder,P., McCahill,M., Torrey,D.; Internet Gopher Protocol: A Distributed Document Search and Retrieval Protocol; FTP from boombox.micro.umn.edu, in directory pub/gopher/gopher\_protocol. [Andrews et al 1994]{} Andrews,K., Kappe,F.; Soaring Through Hyperspace: A Snapshot of Hyper-G and its Harmony Client; Proc. of Eurographics Symposium on Multimedia/Hypermedia in Open Distributed Environments, Graz (1994). [Berners-Lee et al 1994]{} Berners-Lee,T., Cailliau,R., Luotonen,A., Nielsen,H., Secret,A.; The World-Wide Web; Communications of the ACM 37,8 (1994), 76-82. [Kappe et al 1993]{} Kappe,F., Maurer,H., Scherbakov,N.; Hyper-G -- A Universal Hypermedia System; Journal of Educational Multimedia and Hypermedia 2,1 (1993), 39-66. [Maurer 1994]{} Maurer,H.: Advancing the ideas of World-Wide Web; Proc. Conf. on Open Hypermedia Systems, Honolulu (1994), 201-203.