Elsnet
 
   


ELSNET-list archive

Category:   E-Material
Subject:   ELRA News
From:   Valerie Mapelli
Email:   mapelli_(on)_elda.fr
Date received:   10 Jan 2001

Date: Tue, 2 Jan 2001 13:23:45 +0100 (MET) From: Valerie Mapelli Subject: [E-material] c Sender: owner-elsnet-list_(on)_let.uu.nl [ We apologise for the duplicate posting of this announcement ] ___________________________________________________________ ELRA European Language Resources Association ELRA News ___________________________________________________________ *** ELRA NEW RESOURCES *** We are happy to announce new resources available via ELRA: - Telephone Speech Resources ELRA-S0090 Polish SpeechDat(E) Database ELRA-S0092 Portuguese SpeechDat(II) FDB-4000 - Desktop Microphone Speech Resources ELRA-S0087 BABEL Hungarian Database ELRA-S0088 Twin database - TWINDB1 ELRA-S0089 Albayzin corpus ELRA-S0093 IBNC - An Italian Broadcast News Corpus - Speech Related Resources ELRA-S0091 Pronunciation lexicon of British place names, surnames and first names - Written Corpus ELRA-W0025 A "scientific" corpus of modern French (La Recherche magazine) - Multilingual Lexicons ELRA-M0025 Bilingual English-Russian Russian-English Dictionaries A short description of each database is given below. _______________________________________ TELEPHONE SPEECH RESOURCES _______________________________________ - ELRA-S0090 Polish SpeechDat(E) Database This database comprises 1000 Polish speakers (488 males, 512 females) recorded over the Polish fixed telephone network. - ELRA-S0092 Portuguese SpeechDat(II) FDB-4000 This database comprises 4027 Portuguese speakers (1861 males, 2166 females) recorded over the Portuguese fixed telephone network. _______________________________________ DESKTOP/MICROPHONE SPEECH RESOURCES _______________________________________ - ELRA-S0087 BABEL Hungarian Database The BABEL Database is a speech database that was produced by a research consortium funded by the European Union under the COPERNICUS programme (COPERNICUS Project 1304). The Hungarian database consists of: - the basic "common" set which contains the Many Talker Set (30 males, 30 females), Few Talker Set (4 males, 4 females), Very Few Talker Set (1 male, 1 female); -- and the extension part: a short description of Hungarian sound system - ELRA-S0088 Twin database - TWINDB1 The Twin database named TWINDB1 includes recordings of 45 French speakers, consisting of 9 pairs of identical twins (8 males and 10 females) with similar voices, and 27 other speakers (13 males and 14 females) including 4 none-twin siblings. - ELRA-S0089 Albayzin corpus This corpus consists of 3 sub-corpora of 16 kHz 16 bits signals, recorded by 304 Castillian speakers: Phonetic corpus, Geographic corpus, "Lombard" corpus - ELRA-S0093 IBNC - An Italian Broadcast News Corpus Produced within the European Commission funded project LRsP&P (Language Resources Production & Packaging - LE4-8335), the collection consists of 150 broadcast programs from the RAI, for a total time of about 30 hours, issued in 36 different days, between 1992 and 1999. down-sampled to 16kHz 16 bit, and encoded into the NIST Sphere PCM format. _______________________________________ SPEECH RELATED RESOURCES _______________________________________ - ELRA-S0091 Pronunciation lexicon of British place names, surnames and first names This pronunciation lexicon produced within the European Commission funded project LRsP&P (Language Resources Production & Packaging - LE4-8335) is an SGML-encoded database. It contains 160,000 entries of British place-names, surnames and first names All phonemic transcriptions in the database are based on the SAMPA phonetic alphabet. _______________________________________ WRITTEN CORPUS _______________________________________ - ELRA-W0025 A "scientific" corpus of modern French (La Recherche magazine) Produced within the European Commission funded project LRsP&P (Language Resources Production & Packaging - LE4-8335), the corpus contains all articles published in La Recherche magazine in 1998, including issues 305 (January) to 315 (December), which amounts to 447,244 tokens and 30,238 types. Two versions are available: the raw data (XML format) and the complete version (XML and SGML formats) _______________________________________ MULTILINGUAL LEXICONS _______________________________________ - ELRA-M0025 Bilingual English-Russian Russian-English Dictionaries Produced within the European Commission funded project LRsP&P (Language Resources Production & Packaging - LE4-8335), these bilingual dictionaries contain more than 350,000 pairs of words (in tabular form) in XML format: 1) Russian-English dictionary - more than 130,000 entries 2) English-Russian dictionary - more than 95,000 entries Each entry contains: source word (lemma); part of speech of source word; target word(s) (lemma(s)), grouped by same meaning; part of speech of target word(s); domain(s). ===================================== For further information, please contact: ELRA/ELDA Tel +33 01 43 13 33 33 55-57 rue Brillat-Savarin Fax +33 01 43 13 33 30 F-75013 Paris, France E-mail mapelli_(on)_elda.fr or visit our Web site: http//www.icp.grenet.fr/ELRA/home.html or http//www.elda.fr =====================================
 

[print/pda] [no frame] [navigation table] [navigation frame]     Page generated 14-02-2008 by Steven Krauwer Disclaimer / Contact ELSNET