Elsnet
 
   


ELSNET-list archive

Category:   E-InfoReq
Subject:   Request: Arabic transcript marked with pauses
From:  
Email:   E.S.Atwell_(on)_leeds.ac.uk
Date received:   08 Apr 2013

Researchers at the Universities of Leeds and Jordan are looking for a small test corpus (5000+ words) of transcribed Modern Standard Arabic (MSA) annotated with PHRASE BREAKS. The latter should delineate well-formed, meaningful chunks and should not represent disfluencies. To illustrate the kind of thing we are looking for, here is a single MSA sentence of 48 words: http://www.comp.leeds.ac.uk/claireb/msaSentence.pdf In this example, only two words are followed by punctuation - and we have identified these as breaks. In addition, we have also tagged a few other words as likely boundary locations. If you know of or have such a resource, we would love to hear from you. Thanks, Claire Brierley C.Brierley_(at)_leeds.ac.uk Senior Research Fellow School of Computing, University of Leeds, UK __________________________________________ - ELSNET mailing list Elsnet-list_(at)_elsnet.org - To manage your subscription go to: http://mailman.elsnet.org/mailman/listinfo/elsnet-list
 

[print/pda] [no frame] [navigation table] [navigation frame]     Page generated 23-04-2013 by Steven Krauwer Disclaimer / Contact ELSNET