Project description
 |
◦ Sentences for the development of a large vocabulary continuous speech
recognition system
◦ A total of 400 speakers
◦ Age: 10-19(10%), 20-29(30%), 30-39(40%), 40-49(20%)
◦ Office, PC environment
◦ microphone: Andrea ANC 750
◦ Sampling and data Format
- 16,000 Hz, 16 bit Windows wave format
◦ The prompts are from the 43,000,000 words (eojeol) subset of the year
2001 version of KAIST Language(s) Resource Corpus (70,000,000 words (eojeol))
◦ Selection of the prompts
- checking of distributions of high frequent words
- a total of 20,833 sentences
◦ Per speaker
- 104-105 sentences |