EAC-TM is a translation memory (sentences and their manually
produced translations) in 26 languages. It is a multilingual
parallel corpus covering 325 language pairs.
Size: Up to 5100 translation units per language; 78,000 in total.
Languages: All 325 language pairs involving the following 26
Bulgarian, Czech, Danish, Dutch, English, Estonian,
Greek, Finnish, French, Croatian, Hungarian, Icelandic, Italian,
Latvian, Lithuanian, Maltese, Norwegian, Polish, Portuguese,
Romanian, Slovak, Slovene, Spanish, Swedish and Turkish.
Creator: EC Directorate for Education and Culture (EAC) and JRC
WHAT IS EAC-TM
EAC-TM was produced by translating the English language form data
for the EAC's Lifelong Learning Programme (LLP) and the Youth in
Action Programme of the European Commission's Directorate General
for Education and Culture (EAC). The results of the translation
were stored in 25 bilingual translation memories. DG EAC and the
JRC post-processed these by cleaning the data and by producing
one alignment for all 26 languages, resulting in parallel data
for 325 language pairs.
The underlying documents are thus form data in the field of
education and culture.
The EAC Translation Memory is much smaller than the other
multilingual resources distributed in the past by the European
Commission's Joint Research Centre (JRC). Its main advantages are
that (a) it covers even more languages and (b) it is based on
texts from a very different domain (education and culture).
MOTIVATION FOR THIS RELEASE
The public data release is in line with the general effort of the
European Commission to support multilingualism, language
diversity and the re-use of Commission information. It follows
the release of the JRC-Acquis parallel corpus in 2006 (over 1
billion words in 22 languages), of the DGT-TM Translation Memory
in 2007 and 2011, the multilingual named entity resource
JRC-Names in 2011, the multi-label classification software JRC
EuroVoc Indexer JEX in 22 languages in 2012,the ECDC-TM
Translation Memory in 25 languages in 2012, the DGT-Acquis
parallel corpus in 23 languages in 2012, and further smaller
multilingual resources. See http://ipsc.jrc.ec.europa.eu/?id=61
for more information on these resources.
WHAT EAC-TM CAN BE USED FOR
EAC-TM can be fed into translation memory software to support
human translators in their work. As it is a large parallel corpus
in electronic form, it can furthermore be used by specialists in
computational linguistics to train statistical machine
translation software, to generate multilingual dictionaries, to
train and test multilingual information extraction software, and
The JRC and collaborating services of the European Commission
hope to release further large-scale linguistic resources in the
Ralf Steinberger & Mohamed Ebrahim European Commission - Joint
Research Centre (JRC) 21027 Ispra (VA), Italy
URL - Applications: http://emm.newsbrief.eu/overview.html
URL - Publications on the science behind them:
- ELSNET mailing list Elsnet-list_(at)_elsnet.org
- To manage your subscription go to: