Elsnet
 
   


ELSNET-list archive

Category:   E-Announce
Subject:   Call for participation DEFT'08
From:  
Email:   Martine.Hurault-Plantet_(on)_limsi.fr
Date received:   06 Mar 2008
Start date:   09 Jun 2008

****************************************************************** DEFT'08 Call for participation Evaluation workshop in text mining: Text classification by topic and by genre. http://deft08.limsi.fr/ Registration : http://deft08.limsi.fr/inscription.php ****************************************************************** Important dates : Registration : from december 21, 2007 Training corpora : january 14, 2008 Test : three days during the last two weeks of march 2008 Workshop : june 9-13, during TALN'08 conference, Avignon, France (TALN : Traitement Automatique du Langage Naturel) ****************************************************************** The Text Mining Challenge (DEFT for "Défi Fouille de Texte", http://deft.limsi.fr/) has been proposing evaluation campaigns for the last three years in the textmining field, in French. The 2008 edition involves the processing of genre and topic variation in an automatic classifier. Corpora will be in French. Automatic classification has many applications within text mining. From email routing to strategic or scientific lookout, various fields of application have been explored. These last years, a new problematics is emerging, which concerns classifications of texts by genre. Beyond the recognition of a document's topic, finding its genre is useful for guiding the possible use of this document. But how can we recognize both the topic and the genre of a given document? Is genre difference relevant when recognizing its topical category, and, conversely, is a difference in topic relevant during genre recognition? In order to evaluate recognition software in this perspective, we shall simultaneously consider, for the same pre-defined set of categories, two corpora in French with different genres. One is a corpus of press articles from Le Monde (a daily newspaper), and the other one is a corpus of encyclopedic articles from the French version of Wikipedia, the free online encyclopedia. What we mean here by 'genre' refers to a set of texts having some properties in common, involving their domain of activity, their writing practices ant their support. A newspaper article deals with current events, while an encyclopedic article transmits knowledge, but between them both, they share some general topical categories (called 'sections' in the case of the newspaper). The issue will be to test, on these corpora, first, the robustness of a topic classifier submitted to genre variation, and secondly, the possible improvements of topical classification by text genre recognition. Task description **************** We provide two French corpora for the training of the task: - one with articles from Le Monde (a daily newspaper) and articles from the French version of Wikipedia, within a set 'A' of topical categories, with a double tagging, both by genre and by topic, - one with articles from Le Monde and articles from the French version of Wikipedia, within a set 'B' of topical categories, different from 'A', and whose tagging is only topical. We will provide two French corpora for the test, with no tagging at all, each one being used for a different task from another: - task 1: genre and topic recognition of each document from a corpus with articles from Le Monde and articles from the French version of Wikipedia, within the set 'A' of topical categories, - task 2: topic recognition of each document from a corpus with articles from Le Monde and articles from the French version of Wikipedia, within the set 'B' of topical categories. Registration ************ Teams taking part in DEFT'08 should register by filling the online form (http://deft08.limsi.fr/inscription.php), and sign the agreements about restrictions on use of corpora. Committees ********** Organizing committee: Martine Hurault-Plantet (LIMSI), Cyril Grouin (LIMSI), Sylvain Loiseau (LIMSI), Jean-Baptiste Berthelin (LIMSI), Sarra El Ayari (LIMSI) Program committee: Patrick Paroubek (LIMSI), Catherine Berrut (CLIPS), Fabrice Clérot (France Telecom), Guillaume Cleuziou (LIFO), Matthieu Constant (IGM) Béatrice Daille (LINA), Halima Dahmani (CEA-LIST), Marc El-Bèze (LIA), Patrick Gallinari (LIP6), Éric Gaussier (Xerox Research), Thierry Hamon (LIPN), Fidélia Ibekwe-SanJuan (ELICO), Pascal Poncelet (LGI2P), Christophe Roche (LISTIC), Mathieu Roche (LIRMM), Bernard Rothenburger (IRIT - INRIA), Pascale Sébillot (IRISA), Yannick Toussaint (LORIA), François Yvon (LIMSI).
 

[print/pda] [no frame] [navigation table] [navigation frame]     Page generated 08-04-2008 by Steven Krauwer Disclaimer / Contact ELSNET