The aim of this project was to create a mobile phone voice based Hungarian
speech database recorded in noisy environments for testing purposes (also
The database contains voices of 100 speakers, recorded through mobile telephone
in noisy environments.
The main goal of creating this database was to test phoneme based recognizers,
which have been already trained, so the corpus must have been compact and had
to cover as good as possible the specific character of the Hungarian language.
The text that the speaker had to tell was designed to contain at least one of
every Hungarian phoneme, taking in consideration the statistics of phonemes,
diphones, triphones and syllables in Hungarian language.
The corpus contains not only continuously told sentences, but command words,
spelled forenames, numbers, dates, different currency types, city names,
questions with yes/no answer, phonetically rich words. The database contains
mostly spontaneous speech.
Since the whole database contains speech recorded in noisy environments, we
wanted to find out an average value of the signal to noise ratio for the
recorded speech. But this parameter depends on multiple and different factors,
like the type and intensity of the background noise, the parameters of the
channel. Probably this is the reason why the measured signal to noise ratio
varies on a very large scale, between 5dB and 25dB. The lowest value
(approximately 5dB) was measured at the recordings that were made near busy
highways or on public transport (mainly on old trams) in rush hour. The highest
values (approximately 25dB) were measured at the recordings that were made on
side streets, public transport or room (mainly late at night).