4  Language

MARC: 041a

4.1 Complete Dataset Overview

Unique languages: 170

Unique primary languages: 148

1115444 single-language entries (93.91%)

72369 multilingual entries (6.09%)

Unrecognized language: 50515 documents (4.25%)

Conversions from raw to preprocessed language entries

Download language harmonized dataset

Language codes are from MARC; new custom abbreviations can be added in this table.

4.2 Subset Analysis: 1809-1917

Unique languages (1809-1917): 54

60632 single-language entries (94%)

3872 multilingual entries (6%)

Unrecognized language (1809-1917): 929 documents (1.44%)

Download language harmonized dataset (1809-1917)

4.2.1 Top languages for 1809-1917

Number of titles assigned with each language (top-10). For a complete list, see accepted languages (1809-1917).

Language Entries (n) Fraction (%)
Finnish 33674 2.8
Swedish 19107 1.6
Finnish;Swedish 2021 0.2
German 1988 0.2
Latin 1662 0.1
Russian 1587 0.1
Undetermined 929 0.1
French 628 0.1
English 425 0
Swedish;Finnish 170 0

Title count per language (including multi-language documents; note the log10 scale):