http://www.elsnet.org/pix/elsnetheader.jpg

Project description: Prague Arabic Dependency Treebank

[ ID = 0044 ]	PADT
Project name	Prague Arabic Dependency Treebank
Short name or acronym	PADT
Project URL	http://ufal.mff.cuni.cz/padt/online/
Project description	The Prague Arabic Dependency Treebank (PADT) project is an open-ended activity of the Institute of Formal and Applied Linguistics, Charles University in Prague, resting in multi-level annotation of Arabic language resources in the light of the theory of Functional Generative Description. The project is a younger sibling to Prague Dependency Treebank for Czech. The corpus of PADT currently consists of morphologically and syntactically annotated newswire texts of Modern Standard Arabic, which originate from resources published by the Linguistic Data Consortium, University of Pennsylvania --- Arabic Gigaword and the plain data of Penn Arabic Treebank, Part 1 and Part 2. The linguistic description of PADT is unique in Arabic NLP. In morphology, we resolve true grammatical categories rather than decomposing words into morphs, and annotate hierarchies of possible analyses called MorphoTrees. The first version of the treebank, PADT 1.0 at http://ufal.mff.cuni.cz/padt/, was released via LDC in November 2004. PADT 1.0 counts more than 148,000 tokens of data annotated with MorphoTrees. In syntax, dependency relations in a sentence are captured, having a parallel in Prague Dependency Treebank for Czech. The data reach over 113,500 tokens. The more recent ones, roughly 49,000 tokens, have their MorphoTrees lower-level counterpart, the rest has morphology of the same system of tags, but without the reusable disambiguated hierarchies. New development and extended annotations (350,000 tokens of MorphoTrees, 250,000 of analytical syntax, 20,000 of tectogrammatics, i.e. deep syntax) have been taking place since the PADT 1.0 release. Please visit the PADT++ online weblog for newest information about the project, http://ufal.mff.cuni.cz/padt/online/.
Languages	Arabic
Funding	public
Project duration	Sep 2001 - ??? ????
Contact
Name	Otakar Smrz
Organisation	Institute of Formal and Applied Linguistics, Charles University in Prague
Address	Malostranske namesti 25
City	118 00 Praha 1
Country	Czech Republic
Email	padt_(on)_ufal.mff.cuni.cz
Phone	+420 221 914 273
Fax	+420 221 914 309
Update this profile	Last update: 2007-01-08 14:27:15

Browse and Search the elsnet Directory of National Language and Speech Resources Projects World-wide
The National Resources Projects Directory	Browse in alphabetical order	Browse in alphabetical order (in frame)	Browse by country	Browse by ID number	Add your profile	Search directories for keywords and phrases (use ~ for space within keys; most word-initial regular expressions can be used)

[print/pda] [no frame] [navigation table] [navigation frame] Page generated 13-02-2008 by Steven Krauwer

Disclaimer / Contact ELSNET