ELSNET-List Message

ELSNET-List Message

Subject: [ E-CFP ] 1st CFP: Second Workshop on Applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid MT (ML4HMT
From: <martaruizcostajussa_(on)_gmail.com>
Date received: 31 Jul 2012
Deadline: 30 Sep 2012
Start date: 08 Dec 2012

"Second Workshop on Applying Machine Learning Techniques to
Optimise the Division of Labour in Hybrid MT (ML4HMT-12 WS and
Shared Task)"

The "Second Workshop on Applying Machine Learning Techniques to
Optimise the Division of Labour in Hybrid MT: ML4HMT-12" is an
effort to trigger a systematic investigation on improving
state-of-the-art hybrid machine translation, making use of
advanced machine-learning (ML) methodologies.

It follows the ML4HMT-11 workshop (http://www.dfki.de/ml4hmt/),
which took place last November in Barcelona. The first workshop
also road-tested a shared task (and associated data set) and laid
the basis for a broader reach in 2012.

ML4HMT-12 involves regular papers on hybrid MT as well as a
Shared Task.

Regular Papers ML4HMT-12

We are soliciting original papers on hybrid MT, including (but
not limited to):

* use of machine learning methods in hybrid MT;
* system combination: parallel in multi-engine MT (MEMT) or
  sequential in statistical post-editing (SPMT);
* combining phrases and translation units from different types of
* syntactic pre-/re-ordering;
* using richer linguistic information in phrase-based or in
  hierarchical SMT;
* learning resources (e.g., transfer rules, transduction
  grammars) for probabilistic rule-based MT.

Full papers should be anonymous and follow the COLING full paper
format (http://www.coling2012-iitb.org/call_for_papers.php).

Shared Task ML4HMT-12

The main focus of the Shared Task is to address the question:

"Can Hybrid MT and System Combination techniques benefit from
extra information (linguistically motivated, decoding, runtime,
confidence scores, or other meta-data) from the systems

Participants are invited to build hybrid MT systems and/or system
combinations by using the output of several MT systems of
different types, as provided by the organisers.

While participants are encouraged to explore machine learning
techniques to explore the additional meta-data information
sources, other general improvements in hybrid and combination
based MT are strongly invited to participate in the challenge.

For systems that exploit additional meta-data information the
challenge is that additional meta-data is highly heterogeneous
and (individual) system specific.


The ML4HMT-12 Shared Task involves (ES-EN) and (ZH-EN) data sets,
in each case translating into EN.

* (ES-EN): Participants are given a development bilingual set
  aligned at a sentence level. Each "bilingual sentence"
  contains: 1) the source sentence, 2) the target (reference)
  sentence and 3) the corresponding multiple output translations
  from five systems, based on different MT approaches (Apertium,
  Ramirez-Sanchez, 2006; Joshua, Zhifei Li et al, 2009; Lucy,
  Alonso and Thurmair, 2003; Moses, Koehn et. al., 2007). The
  output has been annotated with system-internal meta-data
  information derived from the translation process of each of the

* (ZH-EN) A corresponding data set for ZH-EN with output
  translations from three systems (Moses, Joshua and Huajian
  RBMT) will be provided.

Baselines are given by state-of-the-art open-source
system-combination systems: MANY (Barrault, 2010) and CMU-MEMT
(Heafield and Lavie, 2010).

Participants are challenged to build an MT mechanism that
improves over the baseline, where possible making effective use
of the system-specific MT meta-data output. They can provide
solutions based on opensource systems, or develop their own
mechanisms. The development set can be used for tuning the
systems during the development phase. Final submissions have to
include translation output on a test set, which will be made
available one week after training data release. Data will be
provided to build language/reordering models, possibly re-using
existing resources from MT research.

Participants can also make use of additional (linguistic
analysis, confidence estimation etc.) tools, if their systems
require so, but they have to explicitly declare this upon
submission, so that they are judged as "unconstrained" systems.
This will allow for a better comparison between participating

System output will be judged via peer-based human evaluation as
well as automatic evaluation. During the evaluation phase,
participants will be requested to rank system outputs of other
participants through a web-based interface (Appraise, Federmann
2010). Automatic metrics include BLEU (Papineni et. Al, 2002),
TER (Snover et al., 2006) and METEOR (Lavie, 2005).

Shared task participants will be invited to submit system
description papers (7 pages, not blind and should follow COLING
format, http://www.coling2012-iitb.org/call_for_papers.php).

The ML4HMT workshop is supported by the META-NET T4ME project
(http://t4me.dfki.de/), funded by the DG INFSO of the European
Commission through the Seventh Framework Programme, grant
agreement no.: 249119META-NET (http://www.meta-net.eu/).

Important Dates 2012
15th August Shared task Training data release (updated ML4HMT
corpus) 23rd August Shared task Test data release 15th September
Shared task Translation results submission deadline 21st
September Shared task Evaluation results release 30th September
Workshop full paper and Shared task system description paper
submission deadline 31st October Workshop paper accept/reject
notification 15th November Workshop and Shared task Camera ready
paper due 8th and 9th December Pre-conference workshops

-Prof. Josef van Genabith, Dublin City University (DCU) and
Centre for Next Generation Localisation (CNGL) -Prof. Toni Badia,
Universitat Pompeu Fabra and Barcelona Media (BM) -Christian
Federmann, German Research Center for Artificial Intelligence
(DFKI), contact person: cfedermann_(at)_dfki.de -Dr. Maite Melero,
Barcelona Media (BM) -Dr. Marta R. Costa-jussà, Barcelona Media
(BM) -Dr. Tsuyoshi Okita, Dublin City University (DCU)

Program committee

- Eleftherios Avramidis (German Research Center for Artificial
  Intelligence, Germany)
- Prof. Sivaji Bandyopadhyay (Jadavpur University, India)
- Dr. Rafael Banchs (Institute for Infocomm Research - I2R,
- Prof. Loïc Barrault (LIUM - University of Le Mans, France)
- Prof. Antal van den Bosch (Centre for Language Studies, Radboud
  University Nijmegen, Netherlands)
- Dr. Grzegorz Chrupala (Saarland University, Saarbrücken,
- Prof. Jinhua Du (Xi'an University of Technology (XAUT), China)
- Dr. Andreas Eisele (Directorate-General for Translation (DGT),
- Dr. Cristina España-Bonet (Technical University of Catalonia,
  TALP, Barcelona)
- Dr. Declan Groves (Center for Next Generation Localisation,
  Dublin City University, Ireland)
- Dr. Yuqing Guo (Toshiba China, Research & Development Center)
- Prof. Jan Hajic (Institute of Formal and Applied Linguistics,
  Charles University in Prague)
- Prof. Timo Honkela (Aalto University, Finland)
- Dr. Patrick Lambert (LIUM - University of Le Mans, France)
- Prof. Qun Liu (Institute of Computing Technology, Chinese
  Academy of Sciences, China)
- Dr. Maite Melero (Barcelona Media Innovation Center, Spain)
- Dr. Tsuyoshi Okita (Dublin City University, Ireland)
- Prof. Pavel Pecina (Institute of Formal and Applied
  Linguistics, Charles University in Prague)
- Dr. Marta R. Costa-jussà (Barcelona Media Innovation Center,
- Dr. Felipe Sanchez Martinez (Escuela Politecnica Superior,
  Universidad de Alicante, Spain)
- Dr. Nicolas Stroppa (Google, Zurich, Switzerland)
- Prof. Hans Uszkoreit (German Research Center for Artificial
  Intelligence, Germany)
- Dr. David Vilar (German Research Center for Artificial
  Intelligence, Germany)

- ELSNET mailing list Elsnet-list_(at)_elsnet.org
- To manage your subscription go to:


[ Search | Events calendar | Deadline calendar ]


Page generated 21-06-2018 by Steven Krauwer Disclaimer / Contact ELSNET