BEA Hungarian spontaneous speech database




The aim of developing a phonetically-based multi-purpose database of Hungarian spontaneous speech, dubbed BEA (BEszélt nyelvi Adatbázis ‘spoken language database’),
is to accumulate a large amount of recorded spontaneous speech produced by numerous present-day Budapest speakers, providing ample material for various types of research and practical applications. At the time of writing, the total recorded material of BEA is 250 hours long, meaning approximately 3,400,000 running words. The database primarily contains spontaneous speech materials, but for the sake of comparisons, it also includes sentence repetitions and read texts.
The database offers material for research in a number of areas within linguistics. The study of acoustic-phonetic consequences of the production of speech sounds, coarticulation effects, and suprasegmental
features was hampered for many years by the methodological difficulty that no spontaneous speech material of an adequate quality and quantity was available.
The database contains transcriptions of the BEA speech materials at several levels: i) primary transcription in orthography but without punctuation. Transcribers use Microsoft Office Word (.doc format);
ii) annotation: This form of transcription is a kind of visual display of spoken texts and some further pieces of information related to them in a way that the written text and the actual recording can be displayed/listened to simultaneously. This is made possible by software Transcriber.
In addition to phonetic research in the strict sense, now it becomes possible to carry on conversation analysis, pragmatic research, speech technology, the study of speech accommodation, that of the spontaneous speech of elderly speakers, or that of disfluency phenomena.

