Den här texten finns även på svenska.
This is a description of how you can help Project Runeberg to proofread interesting pieces of Nordic literature, to make them more useful. First a background.
Project Runeberg publishes Nordic literature on the Internet since 1992. The most simple but also the most cumbersome way to publish a book on the Net is to type the text on a keyboard. A better way is to use a scanner together with OCR software (optical character recognition). Either way, the result is a plain text file, which becomes a web page when saved in HTML format. When this page is displayed, all typographic detail from the book is gone. Another disadvantage is the lack of guarantees against typing errors. It does not matter how careful you are. To really know if the text is correct, you have to visit the library and look up the real book.
During the fall of 1998, Project Runeberg started to work with electronic facsimile editions. This means saving the scanned image of a book page. This way, all typographic detail is kept intact and the risk for typing errors is eliminated.
Of course, errors can still occur in image scanning, depending on poor print quality, stains on the paper, or flyspeck. But it would be extremely rare that this leads to one letter being mistaken for another, letters change places, or entire lines of text being left out. These are the kinds of errors one wants to avoid. And facimile images provide the best protection.
At the same time, facsimile images have drawbacks: The image will take longer to download over a modem. You cannot cut-and-paste text to your word processor from an image. You cannot search for words or phrases in an image. And the blind or vision impaired cannot read images in braille. This is because images, to the computer, is just made up of black and white dots, and not of letters and words.
To get letters and words from an image, one has to use OCR software. These programs are really doing good work, but they still leave a few errors behind. The digit 1 gets mixed up with letter l, and letters e and c can get mixed up sometimes. OCR programs also do mistakes on headings, italics, and paragraph breaks. The output from the OCR process is called raw text, as it needs further editing to be useful.
When Project Runeberg produces an electronic facsimile edition, each book page is made into a web page of its own. At the top is the page header with the logotype and the book's title. Then comes the scanned image of the book page. Below this follows the raw text, and at the bottom is the page footer.
As a contrast, for ordinary text editions, a web page is made for each chapter of the book. Chapters can be very different in size. Novels can have chapters that span tens of book pages. Poetry books have poems that are one or two pages. Dictionaries can have many articles on each page.
To create and proofread a text version, starting from a facsimile edition, perform the following steps:
Then all you have to do is wait for the editors of Project Runeberg to acknowledge that the text was received and installed.
If you have any questions, just write to the editors (the address
above). Their intension is to answer each letter individually. You
can write in English, Swedish, Danish, and Norwegian.