Hungarian National Corpus





The national corpus of Hungarian language which is derived into five subcorpora by regional language variants, and into five subcorpora by text genres also. The subcorpus to be studied can be chosen by any combination of these. That makes the HNC an appropriate tool to study the differences not just between text genres but between language variants. HGC wishes to be a representative general-aim corpus of present-day standard Hungarian.
HNC v2 is based on the Hungarian National Corpus with higher quality and finer level of analysis and annotation (detailed morphosyntactic analysis and disambiguation with updated processing toolchain, NP chunking, Named Entity recognition, distributional analysis, built in post-processing (multilevel frequency lists, subsequent searches on previous results)). HNC2 is extended up to 1 gigaword treshold with extended metadata and cleared IPR.

