Subject:   Request: Arabic transcript marked with pauses
Date received:   08 Apr 2013

Researchers at the Universities of Leeds and Jordan are looking for a small test corpus (5000+ words) of transcribed Modern Standard Arabic (MSA) annotated with PHRASE BREAKS. The latter should delineate well-formed, meaningful chunks and should not represent disfluencies. To illustrate the kind of thing we are looking for, here is a single MSA sentence of 48 words: http://www.comp.leeds.ac.uk/claireb/msaSentence.pdf In this example, only two words are followed by punctuation - and we have identified these as breaks. In addition, we have also tagged a few other words as likely boundary locations. If you know of or have such a resource, we would love to hear from you. Thanks, Claire Brierley C.Brierley_(at)_leeds.ac.uk Senior Research Fellow School of Computing, University of Leeds, UK

