*** Call for papers ***

EACL 2003 Workshop on:
"Evaluation Initiatives in Natural Language Processing:
are evaluation methods, metrics and resources reusable?"

13 April 2003, Budapest, Hungary
11th conference of the European Chapter of the 
Association for Computational Linguistics (April 12-17, 2003)
http://www.dcs.shef.ac.uk/~katerina/EACL03-eval


Introduction:
Systems that accomplish different Natural Language Processing
(NLP) tasks have different characteristics and therefore, it
would seem, different requirements for evaluation. However, are
there common features in evaluation methods used in various
language technologies? Could the evaluation methods established
for one type of systems be ported/adapted to another NLP research
area? Could automatic evaluation metrics be ported? For instance,
could Papineni's MT evaluation metric be used for the evaluation
of generated summaries? Could the extrinsic evaluation method
used within SUMMAC be applied to the evaluation of Natural
Language Generation systems? What are the reusability obstacles
encountered and how could they be overcome? What are the
evaluation needs of system types such as dialogue systems, which
have been less strenuously evaluated till now, and how could they
benefit from current practices in evaluating Language Engineering
technologies? What are the evaluation challenges that emerge from
systems that integrate a number of different language processing
functions (e.g. multimodal dialogue systems such as Smartkom)?
Could resources (e.g. corpora) used for a specific NLP task, be
reused for the evaluation of the output of an NLP system and if
so, what adaptations would this require? John White suggested
some years ago a hierarchy of difficulty, or compositionality, of
NLP tasks; if correct, does this have implications for
evaluation?

End-to-end evaluation of systems in a specific NLP area of
research has been attempted within both European initiatives
(e.g. EAGLES/ISLE, ELSE, TEMAA etc.) and U.S. evaluation regimes
with international participation (e.g. MUC, TREC, SUMMAC). It has
been reported that evaluation techniques in the different
Language Engineering areas grow more similar (Hovy et al. 1999),
a fact that emphasizes the need for co-ordinated and reusable
evaluation techniques and measures. The time has come to bring
together all the above attempts to address the evaluation of NLP
systems as a whole and explore ways for reusing established
evaluation methods, metrics and resources, thus, contributing to
a more co-ordinated approach to the evaluation of language
technology.

Target audience:
The aim of this workshop is to bring together leading researchers
from various NLP areas (such as Machine Translation, Information
Extraction, Information Retrieval, Automatic Summarization,
Question-Answering, Dialogue Systems and Natural Language
Generation) in order to explore ways in making the most of
currently available evaluation methods, metrics and resources.


Workshop format:
The workshop will be opened with an invited speaker who will
introduce the topic and present the research questions and
challenges that need to be addressed. Oral presentations divided
into thematic sessions will follow; at the end of each session a
panel discussion will take place. The panels will consist of
members of the programming committee. The workshop will close
with an overview talk.

Topics of interest:
We welcome submissions of both discussion-papers and papers
presenting applied experiments relevant with -but not limited to-
the following topics:
- cross-fertilization of evaluation methods and metrics
- reuse of resources for evaluation (corpora, evaluation tools
  etc.)
- feasibility experiments for the reuse of established evaluation
  methods/metrics/resources in different NLP system types
- reusability obstacles and the notion of compositionality of NLP
  tasks
- evaluation needs and challenges for less strenuously evaluated
  system types (e.g. multimodal dialogue systems ), possible
  benefits from established evaluation practices
- evaluation standards and reusability
- reuse within big evaluation initiatives
- application of e.g. Machine Translation methods to Information
  Retrieval: implications for evaluation

Submission format:
Submissions must be electronic only, and should consist of full
papers of max. 8 pages (inclusive of references, tables, figures
and equations). Authors are strongly encouraged to use the
style-files suggested for the EACL main conference submissions
at: http://www.elsnet.org/workshops/format.html
Please, mail your submissions to Katerina Pastra:
<e.pastra@dcs.shef.ac.uk>

Important dates:
* Deadline for workshop paper submissions:
   TUESDAY, 7 January 2003 (NOTE: strict deadline)
* Notification of workshop paper acceptance: 
   TUESDAY, 28 January 2003
* Deadline for camera-ready workshop papers: 
   THURSDAY, 13 February 2003
* Workshop Date: 
   SUNDAY, 13 April 2003

Program Committee:
Rob Gaizauskas (University of Sheffield, UK)
Donna Harman (NIST, US)
Lynnette Hirschman (MITRE, US)
Maghi King (ISSCO, Switzerland)
Steven Krauwer (Utrecht University, Netherlands)
Inderjeet Mani (MITRE, US)
Patrick Paroubek (LIMSI, France)
Katerina Pastra (University of Sheffield, UK)
Martin Rajman (EPFL - Switzerland)
Horacio Saggion (University of Sheffield, UK)
Yorick Wilks  (University of Sheffield, UK)
(more to follow)

Registration details:
Information on registration fees and procedures will be published
at the main EACL 2003 conference pages at:
http://www.conferences.hu/EACL03/

For detailed and up-to-date information on the workshop please
visit the workshop's website:
http://www.dcs.shef.ac.uk/~katerina/EACL03-eval

For any queries, please contact the workshop organiser:
Katerina Pastra
e.pastra@dcs.shef.ac.uk
tel. +44 114 2221945
fax +44 114 2221810
NLP Group, Department of Computer Science
211 Portobello Street
Sheffield, S1 4DP, U.K.