elsnet |
Central and Eastern European Survey |
NL and Speech Resources available at the organisation: Textual, Software, Lexical resources.
Name: MULTEXT-East aligned corpus
Nature: manually validated PoS tagged text
Language: Multilingual: English + 6 CEE
Size: 7 x 100k words
Format: TEI
Coverage: Orwell's 1984
Medium: Internet
Availability: free for research purposes
Name: IJS-ELAN parallel corpus
Nature: sentence aligned and automatically PoS tagged text
Language: Slovene + English
Size: 2 x 500k words
Format: TEI
Coverage: 15 terminology rich texts: economy, computes, etc.
Medium: Internet
Availability: free
Name: Slovene MULTEXT-East Lexicon
Nature: lexical
Language: Slovene
Size: 15,000 lemmas, full inflectional paradigms
Format: ASCII, tabular list of wordform / lemma / morphosyntacticdescription
Coverage: MULTEXT-East Slovene corpus
Medium: Internet
Availability: free for research purposes
Name: Slovene Diphone Database
Nature: Speech
Language: Slovene
Size: 1224 diphones, cca. 5 Mb
Format: Binary, 16kHz sampling rate, RAW format
Coverage: full
Medium: Machine readable form
Availability: by arrangement
Name: Slovene Readings
Nature: Speech
Language: Slovene
Size: 1000 utterances, cca. 40 Mb
Format: Binary, 19.8kHz sampling rate, binary format
Coverage: mainly declarative, also questions and imperative sentences
Medium: Machine readable form
Availability: free for research purpose
Software description: Slovene TTS system: includes a grapheme-to-phoneme module, diphone
database, direct grapheme to phoneme translation, module for micro and
macro-prosody determination and module for concatenation of speech
units. System is free for research purposes.
Web concordancer: consists of a Perl CGI script and associated HTML
pages. Uses IMS CQP system as the corpus processing back-end. Enables
searches on marked up and parallel text. The interface is freely
available.
[Survey] [Organisation] [General Info] [Training] [Resources] [Research] [Staff] [Publications] |