ELSNET-list archive

Category:   E-Material
Subject:   ELRA News 1/2
From:   Magali Jeanmaire
Email:   duclaux_(on)_elda.fr
Date received:   22 Apr 2003

**************************************************************** ELRA is happy to announce that new resources are available in its catalogue of language resources **************************************************************** You will find below the short descriptions of these new resources. We invite you to visit the on-line catalogue on our web site, at http://www.elda.fr or http://www.elra.info, to get more detailed descriptions. Please contact us if you would like to get more information. **************************************************************** Spoken Language Resources: - S0144 Italian SpeechDat-Car - S0113 Spoken Dutch Corpus: release 6 AURORA Databases - Subset of Italian SpeechDat-Car database (AURORA/CD0003-05) - Aurora 4a and Aurora 4b databases **************************************************************** *** S0144 Italian SpeechDat-Car *** The Italian SpeechDat-Car database contains the recordings in a car of 300 Italian speakers, who uttered around 120 read and spontaneous items. Recordings have been made through 5 different channels, of which 4 were in-car microphones (1 close-talk microphone, 3 far-talk microphones) and 1 channel over the GSM network. *** S0113 Spoken Dutch Corpus: Release 6 *** Release 6 of the Spoken Dutch Corpus was published. Sound files together with their orthographic transcripts are included in this release, as well as various annotations, including e.g. POS tagging, lemmatization, word segmentation, etc. *** Subset of Italian SpeechDat-Car database (AURORA/CD0003-05) *** The Aurora project was originally set up to establish a world wide standard for the feature extraction software which forms the core of the front-end of a DSR (Distributed Speech Recognition) system. ETSI formally adopted this activity as work items 007 and 008.The two work items within ETSI are: - ETSI DES/STQ WI007: Distributed Speech Recognition - Front-End Feature Extraction Algorithm & Compression Algorithm - ETSI DES/STQ WI008: Distributed Speech Recognition - Advanced Feature Extraction Algorithm. This database is a subset of the Italian SpeechDat-Car database which has been collected as part of the European Union funded SpeechDat-Car project. It contains contains 2200 Italian connected digit utterances divided into training and testing utterances in the following noise and driving conditions inside a car: - High speed good road - Low speed rough road - Stopped with motor running - Town traffic *** Aurora 4a & 4b *** The Aurora project is now releasing a number of list files for performing the training and testing on the Wall Street Journal (WSJ0) data at two sampling rates -8 kHz and 16 kHz. The Aurora 4a database is based on the WSJ0 with artificial addition of noise over a range of signal to noise ratios. It contains both clean and multicondition training sets and 14 evaluation sets with different noise types and microphones. An additional database, Aurora 4b, will be released later, that will contain noisy versions of the Nov'92 WSJ0 development set.

[print/pda] [no frame] [navigation table] [navigation frame]     Page generated 14-02-2008 by Steven Krauwer Disclaimer / Contact ELSNET