|
|
Happy Women's Day, March 8!
Recently added to our collection are 19 books by Swedish journalist Elin Wägner (whose works entered the public domain this January) and five years of the Norwegian suffragette journal Nylænde (1888-1892).
Medicine
While the whole world is worrying about the new corona virus, we thought that maybe we can derive some wisdom from the history of previous epidemics. The Spanish Flu of 1918 comes to mind. What has been written about it, really? Perhaps the best accounts we have are encyclopedia entries from the 1920s about influenza, such as this one in Nordisk familjebok.
Over the years, we have digitized and gathered quite a few books relating to medicine, but we have only now made a thematic page for this topic. Most of the books found there are in Swedish. We welcome suggestions for more works to digitize. Recently, we have added:
- Henrik Berg, Spanska sjukan och dess botande enligt den fysikaliska läkemetoden (1918)
- Victor Berglund, Spanska sjukan : Några upplysningar och råd (1918)
- Bror Bjerner, Antidifteriserum mot influensa (epidemica) (1919)
- Israel Hedenius, Om behandlingen av den kroniska polyartriten (1919)
- Klas Linroth et al., Influensan i Sverige 1889-1890 (1890)
- Robert Tigerstedt, Medicinens utveckling till en naturvetenskap (1924)
Happy Public Domain Day!
Public Domain Day is January 1st. This is when works by a new group of authors enter the public domain because copyright expires when they have been dead for 70 years. We continue to celebrate it, even though it is already February. So who died in 1949? And what have we added so far?
- Joel Haugard, journalist and local history writer from Askersund, Sweden
- Maurice Maeterlinck, Belgian writer
- Ernst Newman, essays on the history of Swedish free church movements
- Sigrid Undset, Norwegian writer and winner of the 1928 Nobel Prize
- Ernst Westerberg, Swedish shipbuilding inspector and writer
- Elin Wägner, Swedish journalist and novelist
Since we boldly digitize journals and encyclopedias (having numerous contributors) 70 years after they were published, regardless of when each contributors lived, we have also scanned:
- 1949 of Bonniers litterära magasin
- 1949 of Byggmästaren
- Volume 7. Supplement A-Ö of Nordisk familjeboks sportlexikon.
- 1949 of Ord och Bild
- Den nye Salmonsen, the 4th edition (1949, one volume) of Salmonsens konversationsleksikon, a Danish encyclopedia
- 1948 of Svenska Dagbladets Årsbok (published at the beginning of 1949, which is why the year is one off)
- Volume 5. Lindorm-O and 6. P-Sheldon of Svenska män och kvinnor.
Also recently digitized are some works by people who died in 1947 and 1948:
- Sett från Oljeberget by Hilda Andersson, Swedish missionary
- Several works by Tor Andræ
- August Blanche och hans samtid by Nils Erdmann
- Våra ortnamn och vad de lära oss by Hjalmar Lindroth
- Cesare Lombroso och hans lifsgärning by Ludvig Wolff
Insamlingskampanj 2019/20
Från 12 november till 21 december genomfördes vår insamlingskampanj för året. Det var den andra vi genomförde och den gjordes likadan som den förra. En liten reklamskylt (banner, som ovan) syntes på några av våra webbsidor, uppmanande till donationer med ett givet mål, 25.000 kronor för verksamhetsåret 2019/20. Tanken var att bannern skulle tas bort så fort målet har uppnåtts, för att återkomma nästa år. Sedan länge finns en länk "Donate" i sidhuvudet till alla våra webbsidor. Läs mer på vår sida för donationer.
2019/20 Fundraiser
From November 12 to December 21, a small banner (the one above) was seen on some of our web pages, promoting donations toward our aim of raising 25,000 SEK for the fiscal year 2019/20. The idea was that the banner would be removed as soon as the aim has been reached, to reappear next year. We have long had a link "Donate" in the header of all our web pages. Read more on our donation page.
Redoing OCR
In the year 2000 and again in 2010 we found that OCR of fraktur (blackletter, Gothic) was too difficult and could wait. For normal print (antikva, Latin) we have used the commercial software ABBYY Finereader with great success. Since 2007 we have also increasingly imported books that have been scanned by others and often copied both scanned images and OCR text.
Around 2013 or 2014, the OCR quality for books printed in fraktur and scanned by Nasjonalbiblioteket of Norway suddenly improved radically. It seems they have used a special edition of Finereader developed by some German/Austrian project, but this was outside of our reach. Later, books in fraktur digitized by Det Kongelige Bibliotek of Denmark have also become better.
As we return to consider this problem again in 2019, free software Tesseract (Wikipedia, Github, wiki) is now in version 4.0 and a standard part of the Ubuntu Linux distribution, with support for Swedish and Danish fraktur added around 2015. The output is far from excellent, not as good as the Norwegian books, but much better than some other and quite useful as a starting point for manual proofreading.
We are now, using Tesseract, starting to redo OCR for some books in fraktur. The first attempt is Søren Kierkegaards Samlede Værker (15 volumes, 1920-1926), which were digitized in 2009 at the University of Toronto by the Internet Archive. From their OCR text, of terrible quality, it is apparent that they used ABBYY Finereader for Latin letters. We copied volumes 1-8 in 2014, but decided in 2015 to do our own OCR by manually training Finereader to interpret the fraktur text. This was timeconsuming and painful and the result was not very good. Now, we have copied the remaining volumes and redone OCR for all of them with Tesseract, with much better result.
In the meanwhile, a new edition of Søren Kierkegaards Skrifter (55 printed volumes, 2007-2013) has been published and come online at SKS.dk. There you will find all of the texts, without needing to proofread anything. However, this is not true for all the other books that we provide.
A problem is that we have no algorithm for determining which OCR text is better. The right way to determine this is to manually proofread the page and then see which OCR candidate required the smaller amount of edits to reach the desired result. But of course, when we have two OCR texts for the same page, we want to find out which is better without needing to proofread the page. And we can't just use a spell checker because then any sequence of correctly spelled words would win, regardless of its similarity to the scanned page. So far, we only redo OCR on pages were the naked eye can immediately see that there are too many errors typical of bad fraktur OCR, for example containing words such as "reban" (redan) or "ogfaa" (ogsaa).