INEX NLP Task 2006, Call for Participation
*************************************************************** INEX NLP Task, call for participation Natural Language Interfaces for XML Information Retrieval http://inex.is.informatik.uni-duisburg.de/2006/index.html *************************************************************** XML Retrieval --------------------- Content-oriented XML retrieval has been receiving increasing interest fuelled by the widespread use of the eXtensible Markup Language (XML), as a standard document format. The continuous growth in XML data sources is matched by increasing efforts in the development of XML retrieval systems, which aim at exploiting the available structural information in documents to implement a more focused retrieval strategy and return document components, the so-called XML elements - instead of complete documents - in response to a user query. Implementing this, more focused, retrieval paradigm means that an XML retrieval system needs not only to find relevant information in the XML documents, but also determine the appropriate level of granularity to be returned to the user. In addition, the relevance of a retrieved component is dependent on meeting both content and structural conditions. NLP in XML Retrieval ------------------------------- For the third year, the INitiative for the Evaluation of XML Retrieval (INEX) investigates the idea of using the specifics of XML retrieval to allow users to address content and structural needs intuitively via natural language queries. * Like in traditional information retrieval, the user need is loose, linguistic variations are frequent, answers are a rank list of relevant elements. * Like in database querying, structure is of importance and a simple list of keywords cannot be a sufficient query. Structured query languages have been developed, but appear to be difficult to use. * Furthermore, the size of the unit of information is variable and elements overlap in the documents. Therefore developing natural language interfaces for XML-IR is a separate research domain requiring its own innovative solutions. The ultimate goal is to design and build software that will analyse, understand, and generate results in response to queries that humans express naturally. The primary objective of retrieval would be to interpret both structural and content constraints of an information need expressed in a natural language query (as opposed to the rigid syntax of XPath). The IR system would not only select and rank suitable documents, but select the more suitable XML elements within documents that best satisfy the information need (both accurately and concisely). Collection -------------- 2006 INEX campaign uses English Wikipedia collection. Queries will concern any content or structural elements that can be find in this set of documents, will be written both in English and in NEXI, a formal structured query language. Example: in English: "Find lists of air battles in article dealing with World War II" in NEXI: //article[about("World War II")]//list[about(. air battle)] NLP Tasks --------------- There are two distinct tasks in the NLP track in 2006 - NLQ2NEXI and NLP. * NLQ2NEXI - a simplified task that does not require participants to index the collection or to implement a search engine. Instead, NLQ2NEXI requires the translation of a natural language query, provided in the element of a topic, into a formal INEX query. The submissions of all participants will be evaluated by a running the titles on search engine/s that can operate on NEXI expressions. The objective is to compare the results obtained with natural language queries (translated into NEXI) with the results that are obtained by the same search engine/s when using the original NEXI expressions. This task is designed to allow new participants with NLP expertise to join the INEX workshop without the need to develop a search engine. * NLQ - this task has no restrictions on the use of any NLP technique to interpret the queries as they appear in the element of a topic. Here participants are required to submit retrieval runs, but enjoy the freedom to implement any NLP techniques in their search engine. The objective is not only to compare between different NLP based systems, but to also compare the results obtained with natural language queries with the results obtained with NEXI queries by any other system in the Ad-hoc track. We wish to test whether natural language queries are effective alternatives to formal queries and to quantify the trade off in performance. Important Dates ---------------------- March 17: Deadline for declaration of intent to participate. May 05: Distribution of sets of topics. Jul 14: Submission deadline of search results. Dec 18-20: Workshop in Schloss Dagstuhl. Contact ------------ Shlomo Geva s.geva_(on)_qut.edu.au Xavier Tannier tannier_(on)_emse.fr

