by Lars Aronsson in October 1997
Be warned that this page is under construction!
Comments are welcome
already
Project Runeberg publishes free electronic editions of Nordic literature. Some say we publish e-texts or electronic books. Some say we are a digital library. This text describes how the shelves are organized in this library. How the library staff is organized, is described elsewhere.
If you have any questions or comments, you should write to the editors of Project Runeberg.
On this page: Web Page Structure - Linking More Information - Stimulating Activity
The structure of Project Runeberg has inherited some features of the Internet and the World Wide Web. The Web is one world-wide collaborative hypertext with nodes and links, where the unit of information is a document in HTML format, or a web page. The web page is the unit both for transfer and for addressing. Even though it is possible to address positions within a document, using the HTML "<a name=...>" attribute, this method is more complicated and less used than ordinary links between full pages. the address to a web page is normally referred to as a Uniform Resource Locator, an URL.
The Web is a collaborative hypertext, because anybody can add pages and make links to any other web page, belonging to themselves or to someone else. Authors cannot control who sets up links to their web pages, or why. Many links are set up by search engines that present result lists when a reader has specified a search for keywords. There is nothing in the Web per se that aids the unexperienced reader to understand who has published a particular page, and there is nothing in the Web per se that aids web page authors, webmasters, to fit all of their own documents into a structure. Such structural metadata must be added by the individual webmaster, and webmasters tend do this each in their own way.
This situation has a striking parallel in the world of printed matters. The technology of printing text on paper is separate from the technology of binding papers together into books. Medieval manuscripts were not quarto or octavo, but always folio. Johann Gutenberg is credited with having invented modern book printing, but who invented the modern book binding formats? Also, the printing technology does not force an author to provide a table of contents, a title page, or information about who the author is. Laws state that an issuer or printer must be named, but only conventions of the trade make all books look the same.
If you find an open book on a table, you can turn it around and read the title on the spine. You can open the title page and find the title, the name of the author, the publisher, the printer, and the year of printing. You might even find information about the library or person to which the book belongs. Finding a web page is much more like finding a single page torn out of a book. This is a problem with the Web, and each webmaster has to find a solution that works for them and their audience.
Fitting all information in one large HTML document is not a solution, because the HTML document is the unit of transfer, and many readers use slow dial-up modems (33 kbit/s) with which a large document would take hours to transfer. Therefore, small or moderately sized documents are required.
Using HTML frames has been proposed as a solution, having the table of contents and information corresponding to the title page of a book in one frame, and each member page in another frame. However, this has many drawbacks. One is that it removes the ability for external webmasters to link directly to the member pages, thereby making the Web less collaborative. If they still manage do this, the structural metadata will be left out. Also, frames is a new feature of the Web, and not all browser software supports it.
The solution chosen by Project Runeberg is to include structural metadata in every web page. The requirements on this information has already been mentioned, but deserve to be reiterated:
Some more requirements are added in later sections of this text.
As readers can arrive at Project Runeberg's web pages from just anywhere, the first structural metadata that is found on each page is a link back to the start page of Project Runeberg. The project's start page is just another web page, but it contains introductory information about the project, and provides links to good starting points and indexes. The URL of the start page is announced as the address of the project.
http://www.lysator.liu.se/runeberg/
Project Runeberg itself is a project within LYSATOR, a well-defined set of activities performed by some members of the computer society. But each electronic text published by Project Runeberg also constitutes a well-defined set of activities, a project within the project. We call this an edition, typically corresponding to one printed book that we republish in electronic format. The editions are Project Runeberg's unit of publication.
What would be more natural than to make a single web page of each electronic edition, but for the fact that some editions are too large to be transferred by a slow modem in a reasonable time. Therefore, some editions have to be split up into several web pages, and these must be "bound" together, much like a book is bound together by its spine.
The structure among the web pages within an electronic edition could theoretically be any at all: linear, circular, hierarchical (tree-structured), or free web. The free web structure does not guarantee a single graph, but the other structures do. The hierarchical structure seems natural to the untrained eye, because many printed books are organized in parts, chapters, sections, and subsections. This could serve as a role model, right? But other printed books, such as encyclopediae, are organized as a free web, a structure that was already put off for good reasons. The key to solving this dilemma is to separate the structure of the printed matter from the structure of its contents. The contents might have a hierarchical or a free web structure, but I have yet to find a printed book that is not a linear structure. This is the reasoning behind the choice of a linear structuring of the web pages within an electronic edition within Project Runeberg.
The linear structure chosen to organize the web pages within an edition is represented by structural metadata in the following way:
When the structure of the pages within the edition is settled, the issue remains of how to organize the editions within the project. Conventional libraries have organized books according to their language, their subject matter, their size, and alphabetically by the author's name, title, or--in the case of biographies--by the name of the person described. (Some private collectors have organized their books according to the color of the spine, so to make the shelves in their homes look better.) Both subject matters and languages are often organized in hierarchical structures (Germanic: Scandinavian: Swedish). The same structures are used for finding books in the catalog as for storing books on shelves. Sometimes, this leads to conflicts when science fiction books about computers can be found under either Technology: Computers, or under Fiction: Science Fiction.
A different approach was taken when the International Standard for Book Numbers (ISBN) was introduced. ISBNs define a hierarchical structure by country, publisher, and book number, but all information traditionally used for organizing books in a library was left out. This is the same key to a solution, as when we found that a linear structure was sufficient for the web pages of an edition. ISBNs define a structure for the editions of the world, while library catalogs structure the contents of the books. The contents structure between editions is most likely different from the edition structure, but it could also be ambigious, as in the case of the science fiction book. Better then to allow its ambiguity, and use the simpler serial number approach, once more a linear structure, for organizing the editions within Project Runeberg.
Project Runeberg is situated in Sweden, and ISBN numbers in Sweden are administrated by the Royal Library in Stockholm. Project Runeberg has turned to the Royal Library and applied for a series of ISBN numbers, that would make us equal to other publishers under the 91 prefix allocated to Sweden. The response was, however, that ISBN numbers in Sweden are only allocated for publication on physical matters such as print or CD, but not for publishing within online databases such as web servers.
For each edition published by Project Runeberg, a short code name is allocated, made up from up to eight lower-case letters of the English alphabet and/or decimal digits, that uniquely identifies the edition within Project Runeberg. This is logically equivalent to a serial number, but easier to remember if it is chosen to resemble the title of the work, yet as unique in spelling as a telephone or ISBN number, and suitable for direct use in computer file names etc. This code name is added to the address of the Project Runeberg's start page to get the URL of the start page for the edition, for example:
http://www.lysator.liu.se/runeberg/nilsholg/
...
The Web, being a world-wide collaborative hypertext, has a larger potential than printed matters. Project Runeberg wants to illustrate this by adding structural metadata to its electronic books that can take the reader to information that can otherwise be hard to locate.
Nordic Authors...
Tema...
Project Runeberg was born in a society of students who shared a common interest in computers and technology. More than just the technology, there was a message in the social context. More people should come together and cooperate for their common goals instead of each doing their own hobby project, or--God forbid--passively watch television. Project Runeberg wants to pass on this message to those who share an interest in Nordic literature.
e-mail links...
mailing lists...
encouraging volunteers...