Electronic Facsimile Edition

Den här texten finns även på svenska.

This is a description of how you can help Project Runeberg to proofread interesting pieces of Nordic literature, to make them more useful. First a background.


Project Runeberg publishes Nordic literature on the Internet since 1992. The most simple but also the most cumbersome way to publish a book on the Net is to type the text on a keyboard. A better way is to use a scanner together with OCR software (optical character recognition). Either way, the result is a plain text file, which becomes a web page when saved in HTML format. When this page is displayed, all typographic detail from the book is gone. Another disadvantage is the lack of guarantees against typing errors. It does not matter how careful you are. To really know if the text is correct, you have to visit the library and look up the real book.

During the fall of 1998, Project Runeberg started to work with electronic facsimile editions. This means saving the scanned image of a book page. This way, all typographic detail is kept intact and the risk for typing errors is eliminated.

Of course, errors can still occur in image scanning, depending on poor print quality, stains on the paper, or flyspeck. But it would be extremely rare that this leads to one letter being mistaken for another, letters change places, or entire lines of text being left out. These are the kinds of errors one wants to avoid. And facimile images provide the best protection.

At the same time, facsimile images have drawbacks: The image will take longer to download over a modem. You cannot cut-and-paste text to your word processor from an image. You cannot search for words or phrases in an image. And the blind or vision impaired cannot read images in braille. This is because images, to the computer, is just made up of black and white dots, and not of letters and words.

To get letters and words from an image, one has to use OCR software. These programs are really doing good work, but they still leave a few errors behind. The digit 1 gets mixed up with letter l, and letters e and c can get mixed up sometimes. OCR programs also do mistakes on headings, italics, and paragraph breaks. The output from the OCR process is called raw text, as it needs further editing to be useful.

When Project Runeberg produces an electronic facsimile edition, each book page is made into a web page of its own. At the top is the page header with the logotype and the book's title. Then comes the scanned image of the book page. Below this follows the raw text, and at the bottom is the page footer.

As a contrast, for ordinary text editions, a web page is made for each chapter of the book. Chapters can be very different in size. Novels can have chapters that span tens of book pages. Poetry books have poems that are one or two pages. Dictionaries can have many articles on each page.

How to Proceed

To create and proofread a text version, starting from a facsimile edition, perform the following steps:

  1. We always produce one chapter at a time, as outlined above.
  2. Find out how much of the text belongs to a chapter. This is the number of pages listed in one line of the Table of Contents. Click on the book's title to find the table of contents.
  3. Visit the book pages that are parts of the chapter, one at a time.
  4. Mark and copy the raw text from the book page. Scroll down to find the raw text below the scanned image.
  5. Paste the raw text into your favorite word processor or text editor.
  6. When you got the text, proofreading begins:
  7. Check that the text is correct against the facsimile images. Be extra careful with numbers, such as years.
  8. Check that all headings, italics and paragraph breaks are correct.
  9. Save the text in HTML format. Most word processors have this feature. The file name should end in .HTM or .HTML
  10. Send a letter to runeberg@lysator.liu.se and include the saved file as an attachment. Don't forget to tell us which book and chapter it is.

Then all you have to do is wait for the editors of Project Runeberg to acknowledge that the text was received and installed.

If you have any questions, just write to the editors (the address above). Their intension is to answer each letter individually. You can write in English, Swedish, Danish, and Norwegian.

