*** Call for papers *** EACL 2003 Workshop on: "Evaluation Initiatives in Natural Language Processing: are evaluation methods, metrics and resources reusable?" 13 April 2003, Budapest, Hungary 11th conference of the European Chapter of the Association for Computational Linguistics (April 12-17, 2003) http://www.dcs.shef.ac.uk/~katerina/EACL03-eval Introduction: Systems that accomplish different Natural Language Processing (NLP) tasks have different characteristics and therefore, it would seem, different requirements for evaluation. However, are there common features in evaluation methods used in various language technologies? Could the evaluation methods established for one type of systems be ported/adapted to another NLP research area? Could automatic evaluation metrics be ported? For instance, could Papineni's MT evaluation metric be used for the evaluation of generated summaries? Could the extrinsic evaluation method used within SUMMAC be applied to the evaluation of Natural Language Generation systems? What are the reusability obstacles encountered and how could they be overcome? What are the evaluation needs of system types such as dialogue systems, which have been less strenuously evaluated till now, and how could they benefit from current practices in evaluating Language Engineering technologies? What are the evaluation challenges that emerge from systems that integrate a number of different language processing functions (e.g. multimodal dialogue systems such as Smartkom)? Could resources (e.g. corpora) used for a specific NLP task, be reused for the evaluation of the output of an NLP system and if so, what adaptations would this require? John White suggested some years ago a hierarchy of difficulty, or compositionality, of NLP tasks; if correct, does this have implications for evaluation? End-to-end evaluation of systems in a specific NLP area of research has been attempted within both European initiatives (e.g. EAGLES/ISLE, ELSE, TEMAA etc.) and U.S. evaluation regimes with international participation (e.g. MUC, TREC, SUMMAC). It has been reported that evaluation techniques in the different Language Engineering areas grow more similar (Hovy et al. 1999), a fact that emphasizes the need for co-ordinated and reusable evaluation techniques and measures. The time has come to bring together all the above attempts to address the evaluation of NLP systems as a whole and explore ways for reusing established evaluation methods, metrics and resources, thus, contributing to a more co-ordinated approach to the evaluation of language technology. Target audience: The aim of this workshop is to bring together leading researchers from various NLP areas (such as Machine Translation, Information Extraction, Information Retrieval, Automatic Summarization, Question-Answering, Dialogue Systems and Natural Language Generation) in order to explore ways in making the most of currently available evaluation methods, metrics and resources. Workshop format: The workshop will be opened with an invited speaker who will introduce the topic and present the research questions and challenges that need to be addressed. Oral presentations divided into thematic sessions will follow; at the end of each session a panel discussion will take place. The panels will consist of members of the programming committee. The workshop will close with an overview talk. Topics of interest: We welcome submissions of both discussion-papers and papers presenting applied experiments relevant with -but not limited to- the following topics: - cross-fertilization of evaluation methods and metrics - reuse of resources for evaluation (corpora, evaluation tools etc.) - feasibility experiments for the reuse of established evaluation methods/metrics/resources in different NLP system types - reusability obstacles and the notion of compositionality of NLP tasks - evaluation needs and challenges for less strenuously evaluated system types (e.g. multimodal dialogue systems ), possible benefits from established evaluation practices - evaluation standards and reusability - reuse within big evaluation initiatives - application of e.g. Machine Translation methods to Information Retrieval: implications for evaluation Submission format: Submissions must be electronic only, and should consist of full papers of max. 8 pages (inclusive of references, tables, figures and equations). Authors are strongly encouraged to use the style-files suggested for the EACL main conference submissions at: http://www.elsnet.org/workshops/format.html Please, mail your submissions to Katerina Pastra: Important dates: * Deadline for workshop paper submissions: TUESDAY, 7 January 2003 (NOTE: strict deadline) * Notification of workshop paper acceptance: TUESDAY, 28 January 2003 * Deadline for camera-ready workshop papers: THURSDAY, 13 February 2003 * Workshop Date: SUNDAY, 13 April 2003 Program Committee: Rob Gaizauskas (University of Sheffield, UK) Donna Harman (NIST, US) Lynnette Hirschman (MITRE, US) Maghi King (ISSCO, Switzerland) Steven Krauwer (Utrecht University, Netherlands) Inderjeet Mani (MITRE, US) Patrick Paroubek (LIMSI, France) Katerina Pastra (University of Sheffield, UK) Martin Rajman (EPFL - Switzerland) Horacio Saggion (University of Sheffield, UK) Yorick Wilks (University of Sheffield, UK) (more to follow) Registration details: Information on registration fees and procedures will be published at the main EACL 2003 conference pages at: http://www.conferences.hu/EACL03/ For detailed and up-to-date information on the workshop please visit the workshop's website: http://www.dcs.shef.ac.uk/~katerina/EACL03-eval For any queries, please contact the workshop organiser: Katerina Pastra e.pastra@dcs.shef.ac.uk tel. +44 114 2221945 fax +44 114 2221810 NLP Group, Department of Computer Science 211 Portobello Street Sheffield, S1 4DP, U.K.