| Category: ||E-CFP |
| Subject: ||IJCNLP Workshop on Named Entity Recognition |
| From: || |
| Email: ||anil_(on)_research.iiit.ac.in |
| Date received: ||06 Aug 2007 |
| Deadline: ||15 Sep 2007 |
CALL FOR PAPERS
IJCNLP 2008 WORKSHOP ON NAMED ENTITY RECOGNITION (NER) FOR SOUTH
AND SOUTH EAST ASIAN LANGUAGES
Papers are invited on substantial, original, and unpublished
research on all aspects of Named Entity Recognition (NER) for
South and South East Asian (SSEA) languages. Papers can report
results for other languages, but at least one of the languages
considered should be an SSEA language. We also invite researchers
to be contestants in a shared task (the second track of the
workshop) on NER for SSEA languages.
BACKGROUND AND MOTIVATION
Most of the SSEA languages are scarce in resources as well as
tools and NER systems are no exception. It is very important that
good systems for NER be available, because many problems in
information extraction and machine translation (among others) are
dependent on accurate NER. However, the issues involved are
significantly different for these languages from those for
European languages or even East Asian languages. For example,
these languages do not have capitalization, which is a major
feature for NER systems for European languages.
Another similarity among these languages is that many of them use
scripts of Brahmi origin. For some languages, there are
additional issues such as word segmentation (e.g. for Thai).
Large gazetteers are not available for most of these languages.
Lack of standardization and spelling variation add further
problems. The number of frequently used common nouns which can
also be used as names is very large for many languages, unlike
European languages where a larger proportion of the first names
are not used as common words. Lastly, and most importantly, there
is a serious lack of labeled data for machine learning.
This workshop follows and builds upon this year's NLPAI Machine
Learning Contest (an annual event), which focuses on application
of machine learning techniques for one major NLP problem every
year. This year the problem was NER. However, unlike that event,
this workshop will have one track for regular research papers on
NER for SSEA languages and the second track will be on the lines
of a shared task.
In the shared task, the contestants having their own NER systems
will be given some annotated test data. The participating systems
will be ranked according to their performance on the test data.
There may or may not be training data for a particular language.
In either case, the contestants will have the freedom to use any
technique for NER, e.g. a purely rule based technique or a purely
At present some data is available for Hindi, Bengali and Telugu
for the shared task. Other languages can be included in the
contest provided data for them becomes available. The data
released for the shared task will be made accessible to all
researchers, not just the participants.
If the language you are interested in has not been included in
the shared task, you can also prepare the annotated test data and
submit it to us. We will then include that language in the shared
The task in this contest will be different in one important way.
The NER systems also have to identify nested named entities. For
example, in the sentence "The Lal Bahadur Shastri National
Academy of Administration is located in Mussoorie", 'Lal Bahadur
Shastri' is a Person, but 'Lal Bahadur Shastri National Academy
of Administration' is an Organization. In this case, the NER
systems will have to identify both 'Person' and 'Organization' in
the given sentence.
Paper submission is through the centralized workshop submission
https://www.softconf.com/ijcnlp/NERSSEAL). Papers have to be
written in English. Note that shared task contestants also have
to submit a paper describing their method and the results etc.
Long or short papers can be
submitted to either of the tracks. Long papers can be up to 8
pages long, while the maximum length for short papers is 5 pages
(including references, figures, tables etc.). All selected papers
will be published in the workshop proceedings.
The papers should be formatted using the LaTeX styles or MS Word
templates recommended for the main IJCNLP conference. These
documents are available at
http://www.ijcnlp2008.org/callforpapers.htm. Reviewing will be
blind. The draft papers should not contain any information that
can identify the authors, as far as possible.
Release of Training and Development Data: Aug 2 to Aug 25, 2007
(for different languages) Release of Test Data: Sept 13, 2007
Annotated Test Data Submission Deadline: Sept 15, 2007 Paper
Submission Deadline: Sept 21, 2007 Notification of Paper
Acceptance: Oct 26, 2007 Camera Ready Submission Deadline: Nov
Note: There is no separate registration for the shared task (the
contest). You will be a contestant if you submit the annotated
test data by the deadline mentioned above.
Rajeev Sangal, IIIT, Hyderabad, India
Dekai Wu, The Hong Kong University of Science & Technoong Kong
University of Science & Technology, Hong Kong
Ted Pedersen, University of Minnesota, USA
Dipti Misra Sharma, IIIT, Hyderabad, India
Virach Sornlertlamvanich, TCL, NICT, Thailand
Alexander Gelbukh, Center for Computing Research, National
Polytechnic Institute, Mexico
M. Sasikumar, CDAC, Mumbai, India
Sudeshna Sarkar, Indian Institute of Technology, Kharagpur, India
Thierry Poibeau, CNRS, France
Sobha L., AU-KBC, Chennai, India
Tzong-Han Tsai, National Taiwan University, Taiwan
Prasad Pingali, IIIT, India
Canasai Kreungkrai, NICT, Japan
Manabu Sassano, Yahoo Japan Corporation, Japan
Kavi Narayana Murthy, University of Hyderabad, India
Anil Kumar Singh, IIIT, Hyderabad, India
Doaa Samy, Universidad AutA3a de Madrid, Spain
Ratna Sanyal, Indian Inst. of Inf. Tech., Allahabad, India
V. Sriram, IIIT, Hyderabad, India
Anagha Kulkarni, Carnegie Mellon University, USA
Soma Paul, IIIT, Hyderabad, India
Sofia Galicia-Haro, National Autonomous University, Mexico
Grigori Sidorov, National Polytechnic Institute, Mexico
Dipti Misra Sharma, Rajeev Sangal, Anil Kumar Singh Language
Technologies Research Centre International Institute of
Information Technology Gachibowli, Hyderabad, India
Phone: 91-40-23001412, 91-40-23001967/9 Extension 144 Fax:
91-40-23001413 Email: dipti_(on)_iiit.ac.in dipti_(on)_iiit.ac.in>
, sangal_(on)_iiit.ac.in sangal_(on)_iiit.ac.in> ,