HHC: Hungarian historical corpus





Hungarian historical corpus (further as HHC) is a collection of texts written between 1772 and 1997 in different genres, containing ca. 27 million tokens. During the compilation of HHC, text samples were selected by professionals (literary historians, historians, mathematicians etc.) from printed works. A relative majority (40%) of the texts are dated from the second half of the 20th century. The corpus is the product of the Department of Lexicography and Lexicology at RIL HAS, made between 1986 and 1997, maintained continuously since then. As an innovation, genre labeling was unified. Thus, genres and text types in HHC and HNC are marked similarly, this makes possible to search data of these corpora by using the same query structure.

