Project description
 |
◦ Multimodal corpus of voice and video of the frontal face captured by
the camcorder
◦ A total of 100 speakers speaking the standard dialect (Age: 20 ~ 30)
◦ Studio: 230cm * 230 cm
◦ Lighting: basic lighting + front (one 200W), right/left(two 100W), and
ground (one 60W)
◦ Background: blue screen
◦ Equipment: Sony DCR-VX2000
◦ Video sampling and data format(avi)
- resolution: 720*480
- frame speed: 30 fps
- condensation: MPEG-4 V1, key frame at maximum
◦ Speech sampling and data format(wav)
- 16kHz, 16bits, mono, Microsoft Windows WAVE format
◦ 5 basic vowels
◦ 12 single digits
◦ 452 phonetically balanced words (PBW)
◦ Per speaker
- 5 basic vowels
- 12 single digits
- 90 ~ 92 PBWs |