Elsnet
 


Project description: Prague Arabic Dependency Treebank

[ ID = 0044 ] PADT 
Project namePrague Arabic Dependency Treebank 
Short name or acronymPADT 
Project URL http://ufal.mff.cuni.cz/padt/online/ 
Project description

The Prague Arabic Dependency Treebank (PADT) project is an open-ended activity 
of the Institute of Formal and Applied Linguistics, Charles University in 
Prague, resting in multi-level annotation of Arabic language resources in the 
light of the theory of Functional Generative Description. The project is a 
younger sibling to Prague Dependency Treebank for Czech.

The corpus of PADT currently consists of morphologically and syntactically 
annotated newswire texts of Modern Standard Arabic, which originate from 
resources published by the Linguistic Data Consortium, University of 
Pennsylvania --- Arabic Gigaword and the plain data of Penn Arabic Treebank, 
Part 1 and Part 2.

The linguistic description of PADT is unique in Arabic NLP. In morphology, we 
resolve true grammatical categories rather than decomposing words into morphs, 
and annotate hierarchies of possible analyses called MorphoTrees.

The first version of the treebank, PADT 1.0 at http://ufal.mff.cuni.cz/padt/, 
was released via LDC in November 2004. PADT 1.0 counts more than 148,000 tokens 
of data annotated with MorphoTrees. In syntax, dependency relations in a 
sentence are captured, having a parallel in Prague Dependency Treebank for 
Czech. The data reach over 113,500 tokens. The more recent ones, roughly 49,000 
tokens, have their MorphoTrees lower-level counterpart, the rest has morphology 
of the same system of tags, but without the reusable disambiguated hierarchies.

New development and extended annotations (350,000 tokens of MorphoTrees, 
250,000 of analytical syntax, 20,000 of tectogrammatics, i.e. deep syntax) have 
been taking place since the PADT 1.0 release. Please visit the PADT++ online 
weblog for newest information about the project, 
http://ufal.mff.cuni.cz/padt/online/.
LanguagesArabic
Fundingpublic
Project durationSep 2001 - ??? ????
Contact
Name Otakar Smrz
OrganisationInstitute of Formal and Applied Linguistics, Charles University in Prague 
Address Malostranske namesti 25 
City118 00 Praha 1
Country Czech Republic 
Emailpadt_(on)_ufal.mff.cuni.cz 
Phone+420 221 914 273 
Fax+420 221 914 309 
Update this profile Last update: 2007-01-08 14:27:15

 

Browse and Search the Directory of National Language and Speech Resources Projects World-wide
The National Resources Projects Directory
Browse in alphabetical order Browse in alphabetical order (in frame) Browse by country Browse by ID number Add your profile

Search directories for keywords and phrases (use ~ for space within keys; most word-initial regular expressions can be used)

 

[print/pda] [no frame] [navigation table] [navigation frame]     Page generated 13-02-2008 by Steven Krauwer Disclaimer / Contact ELSNET