4 Language
MARC: 041a
4.1 Complete Dataset Overview
Unique languages: 170
1115444 single-language entries (93.91%)
72369 multilingual entries (6.09%)
Unrecognized language: 50515 documents (4.25%)
Conversions from raw to preprocessed language entries
Download language harmonized dataset
Language codes are from MARC; new custom abbreviations can be added in this table.
4.2 Subset Analysis: 1809-1917
Unique languages (1809-1917): 54
60632 single-language entries (94%)
3872 multilingual entries (6%)
Unrecognized language (1809-1917): 929 documents (1.44%)
Download language harmonized dataset (1809-1917)
4.2.1 Top languages for 1809-1917
Number of titles assigned with each language (top-10). For a complete list, see accepted languages (1809-1917).
Language | Entries (n) | Fraction (%) |
---|---|---|
Finnish | 33674 | 2.8 |
Swedish | 19107 | 1.6 |
Finnish;Swedish | 2021 | 0.2 |
German | 1988 | 0.2 |
Latin | 1662 | 0.1 |
Russian | 1587 | 0.1 |
Undetermined | 929 | 0.1 |
French | 628 | 0.1 |
English | 425 | 0 |
Swedish;Finnish | 170 | 0 |
Title count per language (including multi-language documents; note the log10 scale):