elsnet

Central and Eastern European Survey

ELSNET is the European Network in Human Language Technologies (http://www.elsnet.org) This page is http://www.elsnet.org/survey/DepartmentofIntelligentSystemsSI1000LjubljanaSlovenia/resources.html [ print/pda version ] [ screen version ] [ navigation table ] [ navigation frame ]

Department of Intelligent Systems Jozef Stefan InstituteResources

NL and Speech Resources available at the organisation: Textual, Software, Lexical resources.

Name: MULTEXT-East aligned corpus
Nature: manually validated PoS tagged text
Language: Multilingual: English + 6 CEE
Size: 7 x 100k words
Format: TEI
Coverage: Orwell's 1984
Medium: Internet
Availability: free for research purposes

Name: IJS-ELAN parallel corpus
Nature: sentence aligned and automatically PoS tagged text
Language: Slovene + English
Size: 2 x 500k words
Format: TEI
Coverage: 15 terminology rich texts: economy, computes, etc.
Medium: Internet
Availability: free

Name: Slovene MULTEXT-East Lexicon
Nature: lexical
Language: Slovene
Size: 15,000 lemmas, full inflectional paradigms
Format: ASCII, tabular list of wordform / lemma / morphosyntacticdescription
Coverage: MULTEXT-East Slovene corpus
Medium: Internet
Availability: free for research purposes

Name: Slovene Diphone Database
Nature: Speech
Language: Slovene
Size: 1224 diphones, cca. 5 Mb
Format: Binary, 16kHz sampling rate, RAW format
Coverage: full
Medium: Machine readable form
Availability: by arrangement

Name: Slovene Readings
Nature: Speech
Language: Slovene
Size: 1000 utterances, cca. 40 Mb
Format: Binary, 19.8kHz sampling rate, binary format
Coverage: mainly declarative, also questions and imperative sentences
Medium: Machine readable form
Availability: free for research purpose

Software description: Slovene TTS system: includes a grapheme-to-phoneme module, diphone database, direct grapheme to phoneme translation, module for micro and macro-prosody determination and module for concatenation of speech units. System is free for research purposes. Web concordancer: consists of a Perl CGI script and associated HTML pages. Uses IMS CQP system as the corpus processing back-end. Enables searches on marked up and parallel text. The interface is freely available.

This page is no longer maintained. Please visit http://www.elsnet.org/survey/quests to find out how to update your organisation profile or to find information about this organisation

[Survey] [Organisation] [General Info] [Training] [Resources] [Research] [Staff] [Publications]

[print/pda] [no frame] [table] [frames] This page was generated 04-01-1998 by Steven Krauwer