ELSNET-list archive

Category:   E-CFP
Subject:   Workshop on Evaluation for Language&Dialogue Systems
From:   Priscilla Rasmussen
Email:   rasmusse_(on)_cs.rutgers.edu
Date received:   05 Mar 2001
Deadline:   06 Apr 2001
Start date:   06 Jul 2001

Call for Papers Workshop on Evaluation for Language and Dialogue Systems ACL/EACL 2001 Toulouse, France July 6-7, 2001 WORKSHOP GOALS The aim of this two day workshop is to identify and to synthesize current needs for language-technology evaluation. The first day of the workshop will focus on one of the most challenging current issues in language engineering: the evaluation of dialogue systems and models. The second day will extend the discussion to address the problem of evaluation in language engineering more broadly and on more theoretical grounds. The space of possible dialogues is enormous, even for limited domains like travel information servers. The generalization of evaluation methodologies across different application domains and languages is an open problem. Review of published evaluations of dialogue models and systems suggests that usability techniques are the standard method. Dialogue-based system are often evaluated in terms of standard, objective usability metrics, such as task-completion time and number of user actions. In the past, researchers have proposed and debated theory-based methods for modifying and testing the underlying dialogue model, but the most widely used method of evaluation is usability testing, although more precise and empirical methods for evaluating the effectiveness of dialogue models have been proposed. For task-based interaction, typical measures of effectiveness are time-to-completion and task outcome, but the evaluation should focus on user satisfaction rather than on arbitrary effectiveness measurements.Indeed, the problems faced in current approaches to measurement of effectiveness dialogue models and systems include: Direct measures are unhelpful because efficient performance on the nominal task may not represent the most effective interaction Indirect measures usually rely on judgment and are vulnerable to weak relationships between the inputs and outputs Subjective measures are unreliable and domain-specific For its first day, the workshop organizers solicit papers on these issues, with particular emphasis on methods that go beyond usability testing to address the underlying dialogue model. Representative questions to be addressed include: o How do we deal with the combinatorial explosion of dialogue states? o How can satisfaction be measured with respect to underlying dialogue models? o Are there useful direct measures of dialogue properties that do not depend on task efficiency? o What is the role of agent-based simulation in evaluation of dialogue models? Of course, the problems faced in evaluating dialogue and system models are found in other domains of language engineering, even for non-interactive processes such as part-of-speech tagging, parsing, semantic disambiguation, information extration, speech transcription, and audio document indexing. So the issue of evaluation can be viewed at a more generic level, raising fundamental, theoretical questions such as: o What are the interest and benefits of evaluation for language engineering? o Do we really need these specific methodologies, since a form of evaluation sould always be present in any scientific investigation? o If evaluation is needed in language engineering, is it the case for all domains? o What form should it take? Technology evaluation (task-oriented in laboratory environment) or field/user Evaluation (complete systems in real-life conditions)? We have seen before that the the evaluation of dialogue models is still unsolved, but for domains where metrics already exists, are they satisfactory and sufficient? How can we take into account or abstract from the subjective factor introduced by human operators in the process? Do similarity measures and standards offer appropriate answers to this problem? Most of the efforts focus on evaluating process, but what about the issue of language resources evaluation? For its second day of work, the workshop organizers solicit papers on these issues, with the intent to address the problem of evaluation both from a broader perspective (including novel applications domains for evaluation, new metrics for known tasks and resource evaluation) and a more theoretical point of view (including formal theory of evaluation and infrastructural needs of language engineering). NOTE: People who would like to submit a paper on lexical semantic disambiguation evaluation should consider the parallel workshop, on July 5-6, for the closure of the SENSEVAL-2 evaluation campaign. ------------------------------------------------------------- WORKSHOP ORGANIZATION The organization of each of the two days of the workshop will reflect the workshop's two main themes. Each day will begin with a session of presentations of selected papers and follow with panel discussions to synthesize and develop possible methodologies from additional selected workshop papers. WORKSHOP PARTICIPATION The workshop seeks participation from people involved or interested in the problem of evaluation in language processing and the research and industrial communities that study and implement dialogue models for natural-language interaction systems. The first part of the workshop will specifically draw on the natural-language interaction community, for instance like the one developing at the confluence of SIGdial and SIGCHI, which will find in this workshop an atmosphere more flavored by computational-linguistics related issues (see, for example, the First SIGdialWorkshop on Discourse and Dialogue). The second part of the workshop is intended to provide a forum for a broader audience more in the spirit of the one that attended the LREC'2000 Satellite Workshop on Evaluation (see http://www.limsi.fr/TLP/CLASS), in particular offering an opportunity to people involved in language engineering evaluation (e.g ., the CLASS audience) in the context of national or transnational projects or programs, both in Europe and abroad. ------------------------------------------------------------- SUBMISSION DETAILS Paper submissions should follow the two-column format of ACL proceedings and should not exceed eight (8) pages, including references. We strongly recommend the use of ACL LaTeX style files or Microsoft Word Style files tailored for this year's conference. They are available from the ACL-2001 program committee Web site at http://acl2001.dfki.de/style/. Papers should be submitted electronically, as either a LaTeX, Word or PDF file to either: Patrick Paroubek, pap_(on)_limsi.fr Karen Ward, kward_(on)_cs.utep.edu ------------------------------------------------------------- TIMETABLE OF IMPORTANT DATES Deadline for workshop paper submissions: April 6, 2001 Deadline for notification of workshop paper acceptance: April 27, 2001 Deadline for camera-ready workshop papers: May 16, 2001 Workshop date: July 6-7, 2001 ------------------------------------------------------------- WORKSHOP ORGANIZING COMMITTEE David G. Novick, UTEP novick_(on)_cs.utep.edu http://www.cs.utep.edu/novick Joseph Mariani, Limsi - CNRS mariani_(on)_limsi.fr http://www.limsi.fr/Individu/mariani Candy Kamm, AT&T Labs cak_(on)_research.att.com http://www.research.att.com/info/cak Patrick Paroubek, Limsi - CNRS pap_(on)_limsi.fr http://www.limsi.fr/Individu/pap Nils Dahlbäck, Linköping University nilda_(on)_ida.liu.se http://www.ida.liu.se/%7Enilda/ Frankie James, NASA Ames Research Center fjames_(on)_riacs.edu http://www-pcd.stanford.edu/frankie/ Karen Ward, UTEP, kward_(on)_cs.utep.edu http://www.cs.utep.edu/kward ------------------------------------------------------------- SCIENTIFIC COMMITTEE David G. Novick Joseph Mariani Candy Kamm Patrick Paroubek Nils Dahlbäck Frankie James Karen Ward Christian Jacquemin Niels Ole Bernsen Stephane Chaudiron Khalid Choukri Martin Rajman Robert Gaizauskas Donna Harman Lynette Hirschman (tentative) David Pallett (tentative) Carol Peters (tentative) Jose Pardo (tentative) Herman Steeneken (tentative) Oliviero Stock (tentative) Saïd Tazi Hans Uszkoreit (tentative) ------------------------------------------------------------- SPONSORS ACL 2001 CLASS ELRA ELSNET We also anticipate co-sponsorship from SIGdial. ------------------------------------------------------------- ADDITIONAL INFORMATION Additional information on the workshop, including accepted papers and the workshop schedule, will be made available as needed at http://www.limsi.fr/TLP/CLASS/e acl01.html

