- Project Runeberg -  Welcome to Project Runeberg
Front page | Next >>
Lysator Linköping University
  Project Runeberg | Like | Catalog | Recent Changes | Donate | Comments? |   
Project Runeberg (runeberg.org) is a volunteer effort to create free electronic editions of classic Nordic (Scandinavian) literature and make them openly available over the Internet. Projekt Runeberg (runeberg.org) arbetar på frivillig grund med att skapa fria elektroniska utgåvor av klassisk nordisk litteratur och göra dem öppet tillgängliga över Internet.

Project Runeberg, February 2020


February 2020

Happy Public Domain Day!

Public Domain Day is January 1st. This is when works by a new group of authors enter the public domain because copyright expires when they have been dead for 70 years. We continue to celebrate it, even though it is already February. So who died in 1949? And what have we added so far?

Since we boldly digitize journals and encyclopedias (having numerous contributors) 70 years after they were published, regardless of when each contributors lived, we have also scanned:

Also recently digitized are some works by people who died in 1947 and 1948:


November 2019

Insamlingskampanj 2019/20

Från 12 november till 21 december genomfördes vår insamlingskampanj för året. Det var den andra vi genomförde och den gjordes likadan som den förra. En liten reklamskylt (banner, som ovan) syntes på några av våra webbsidor, uppmanande till donationer med ett givet mål, 25.000 kronor för verksamhetsåret 2019/20. Tanken var att bannern skulle tas bort så fort målet har uppnåtts, för att återkomma nästa år. Sedan länge finns en länk "Donate" i sidhuvudet till alla våra webbsidor. Läs mer på vår sida för donationer.

2019/20 Fundraiser

From November 12 to December 21, a small banner (the one above) was seen on some of our web pages, promoting donations toward our aim of raising 25,000 SEK for the fiscal year 2019/20. The idea was that the banner would be removed as soon as the aim has been reached, to reappear next year. We have long had a link "Donate" in the header of all our web pages. Read more on our donation page.


March 2019

Redoing OCR

In the year 2000 and again in 2010 we found that OCR of fraktur (blackletter, Gothic) was too difficult and could wait. For normal print (antikva, Latin) we have used the commercial software ABBYY Finereader with great success. Since 2007 we have also increasingly imported books that have been scanned by others and often copied both scanned images and OCR text.

Around 2013 or 2014, the OCR quality for books printed in fraktur and scanned by Nasjonalbiblioteket of Norway suddenly improved radically. It seems they have used a special edition of Finereader developed by some German/Austrian project, but this was outside of our reach. Later, books in fraktur digitized by Det Kongelige Bibliotek of Denmark have also become better.

As we return to consider this problem again in 2019, free software Tesseract (Wikipedia, Github, wiki) is now in version 4.0 and a standard part of the Ubuntu Linux distribution, with support for Swedish and Danish fraktur added around 2015. The output is far from excellent, not as good as the Norwegian books, but much better than some other and quite useful as a starting point for manual proofreading.

We are now, using Tesseract, starting to redo OCR for some books in fraktur. The first attempt is Søren Kierkegaards Samlede Værker (15 volumes, 1920-1926), which were digitized in 2009 at the University of Toronto by the Internet Archive. From their OCR text, of terrible quality, it is apparent that they used ABBYY Finereader for Latin letters. We copied volumes 1-8 in 2014, but decided in 2015 to do our own OCR by manually training Finereader to interpret the fraktur text. This was timeconsuming and painful and the result was not very good. Now, we have copied the remaining volumes and redone OCR for all of them with Tesseract, with much better result.

In the meanwhile, a new edition of Søren Kierkegaards Skrifter (55 printed volumes, 2007-2013) has been published and come online at SKS.dk. There you will find all of the texts, without needing to proofread anything. However, this is not true for all the other books that we provide.

A problem is that we have no algorithm for determining which OCR text is better. The right way to determine this is to manually proofread the page and then see which OCR candidate required the smaller amount of edits to reach the desired result. But of course, when we have two OCR texts for the same page, we want to find out which is better without needing to proofread the page. And we can't just use a spell checker because then any sequence of correctly spelled words would win, regardless of its similarity to the scanned page. So far, we only redo OCR on pages were the naked eye can immediately see that there are too many errors typical of bad fraktur OCR, for example containing words such as "reban" (redan) or "ogfaa" (ogsaa).


February 2019

Insamlingskampanj 2018/19

Vi provade något nytt: en insamlingskampanj. Från söndag 10 februari till fredag 8 mars syntes en liten reklamskylt (banner, som ovan) på några av våra webbsidor, som uppmanade till donationer med ett givet mål, 25.000 kronor för verksamhetsåret 2018/19. Den uttalade tanken var att bannern skulle tas bort så fort målet hade uppnåtts, för att återkomma nästa år. Målet uppnåddes redan inom en månad. Sedan länge finns en länk "Donate" i sidhuvudet till alla våra webbsidor. Läs mer på vår sida för donationer.

2018/19 Fundraiser

It was our first attempt ever at an annual fundraiser. Starting on Sunday February 10th and ending on Friday March 8th, a small banner (the one above) was seen on some of our web pages, promoting donations toward our aim of raising 25,000 SEK for the fiscal year 2018/19. The idea was that the banner would be removed as soon as the aim had been reached, to reappear next year. It was reached already within a month. We have long had a link "Donate" in the header of all our web pages. Read more on our donation page.


Project Runeberg, 2020-02-17 23:59 (runeberg)
http://runeberg.org/

Valid HTML 4.0! All our files are DRM-free