ELSNET-List Message
Elsnet
 
   


ELSNET-List Message

Subject: [ E-Announce ] 1st Call for Participation: 1st Shared Task on Native Language Identification
From: <Joel.Tetreault_(on)_nuance.com>
Date received: 04 Jan 2013
Deadline: 18 Mar 2013
Start date: 13 Jun 2013





(apologies in advance for cross-posting)

                                  CALL FOR PARTICIPATION

                   1st Shared Task on Native Language
                   Identification

                        Atlanta, Georgia, USA; June 13 or 14,
                        2013 (co-located with BEA8 Workshop)

                    https://sites.google.com/site/nlisharedtask2013/

SHARED TASK DESCRIPTION

We are excited to organize the first shared task in Native
Language Identification (NLI) which is the task of identifying
the native language (L1) of a writer based solely on a sample of
their writing. The task is framed as a classification problem
where the set of L1s is known a priori. Most work has focused on
identifying the native language of writers learning English as a
second language. This problem has been growing in popularity and
has motivated several ACL, NAACL and EMNLP papers, as well as a
master's and doctorate thesis.

Native Language Identification can be useful for a number of
applications. First, it can be used in educational settings to
provide more targeted feedback to language learners about their
errors. It is well known that speakers of different languages
make different kinds of errors when learning a language. A
writing tutor system which can detect the native language of the
learner will be able to tailor the feedback about the error and
contrast it with common properties of the learner's language.
Second, native language is often used as a feature that goes into
authorship profiling, which is frequently used in forensic
linguistics.

The goal of this task is to provide a space to evaluate different
techniques and approaches to Native Language Identification. To
date, it has been difficult to compare approaches due to issues
with training and testing data and a lack of consistency in
evaluation standards. In this shared task, we provide a new data
set as well provide a framework where different NLI systems can
be finally compared. The shared task will be co-located with the
8th Workshop on Innovative Use of NLP for Building Educational
Applications on June 13 or 14 in Atlanta, USA:

http://www.cs.rochester.edu/~tetreaul/naacl-bea8.html

DATA

Educational Testing Service (ETS) is making public 11,000 English
essays from the Test of English as a Foreign Language(TOEFL)
through the LDC with the motivation to create a larger and more
reliable data set for researchers to conduct Native Language
Identification experiments on. This set, henceforth TOEFL11,
comprises 11 L1s with 1,000 essays per L1. The 11 native
languages covered by our corpus are: Arabic, Chinese, French,
German, Hindi, Italian, Japanese, Korean, Portuguese, Spanish,
Telugu, and Turkish. Furthermore, each essay in the TOEFL11 is
labeled with an English language proficiency level (high, medium,
or low) based on the judgments of human assessment specialists.
The essays are usually 300 to 400 words long. 900f this set
will be sequestered as the training data and the remaining 10%
will be released as test data.

EVALUATION

The shared task will have three sub-tasks:

* Closed-Training: The first and main task will be the 11 way
  classification task using only the TOEFL11 for training.
* Open-Training-1: The second task will be to allow the use of
  any amount or type of training data excluding the TOEFL11.
* Open-Training-2: The third task will be to allow the use of any
  amount or type of training data.

The same test data will be used for all sub-tasks.

REGISTRATION

If you would like to participate in the NLI Shared Task, you need
to formally register in order to obtain the training and test
data. To register, please send the following information to
nlisharedtask2013_(at)_gmail.com:

* Name of Institution or other label appropriate for your team
* Name of contact person for your team
* Email address of contact person for your team

SCHEDULE

January 14 - Training Data Release March 11 - Test Data Release
March 18 - Submissions Due March 25 - Results Announcement April
08 - Papers Due April 10 - Revision Requests Sent April 12 -
Camera Ready Version Due June 13 or 14 - NLI Shared Task
Presentations _(at)_ BEA8 Workshop

ORGANIZERS

Joel Tetreault, Nuance Communications, USA Aoife Cahill,
Educational Testing Service, USA Daniel Blanchard, Educational
Testing Service, USA

Contact email: nlisharedtask2013_(at)_gmail.com

ATT00001

__________________________________________
- ELSNET mailing list Elsnet-list_(at)_elsnet.org
- To manage your subscription go to:
  http://mailman.elsnet.org/mailman/listinfo/elsnet-list


	

[ Search | Events calendar | Deadline calendar ]

 

Page generated 20-12-2014 by Steven Krauwer Disclaimer / Contact ELSNET