ELSNET-list archive

Category:   E-CFP
From:   Enrique Alfonseca
Email:   Enrique.Alfonseca_(on)_uam.es
Date received:   05 Jun 2006
Deadline:   01 Aug 2006
Start date:   05 Nov 2006

FIRST CALL FOR PAPERS WEB CONTENT MINING WITH HUMAN LANGUAGE TECHNOLOGIES http://orestes.ii.uam.es/workshop/ workshop to be held at the 5th International Semantic Web Conference, Athens, GA, U.S.A. November 5-9 2006 http://iswc2006.semanticweb.org/ MOTIVATION, AIM AND SCOPE With the large growth of the information stored in the World Wide Web, it is necessary to have available tools for automatic or semi-automatic analyzes of web data. Hence, a large effort has been invested in the last years in developing techniques for extracting patterns and implicit information from the web, a task that is usually known as Web Mining. Web Mining itself can be divided into three subtasks according to the kind of data that is collected: web structure, web usage and web content. Web content mining consists of automatically mining data from textual web documents that can be represented with machine-readable semantic formalisms. Initially, most web content mining systems used wrappers to map documents to other data structures, but this is highly dependent on the the layout and formatting instructions inside web pages. Therefore, alternative approaches, that make use of Natural Language Processing-based techniques, are increasingly used. While more traditional approaches to Information Extraction from text, such as those applied to the Message Understanding Conferences during the nineties, relied on small collections of documents with many semantic annotations, the characteristics of the web (its size, redundancy and the lack of semantic annotations in most texts) favor efficient algorithms able to learn from unannotated data. Furthermore, new types of web content such as web forums, blogs and wikis, some of them included in the so-called Web 2.0, are also a source of textual information that contain an underlying structure from which specialist systems can benefit. The workshop will give special emphasis to how existing techniques can benefit from these kinds of contents. This workshop aims at bringing together researchers from the Semantic Web, the Natural Language Processing and the Text Mining communities. The web constitutes a unique source of information to train and exploit systems for tasks such as Named Entity Identification and Classification, Term Identification, Relationships Extraction, Ontology Learning and Population from text and Text Mining. The Semantic Web community can contribute providing semantic formalisms and tools for knowledge representation and reasoning for exploiting the extracted metadata. The goal of the workshop is to establish communication between all these communities. TOPICS OF INTEREST Topics of interest include, but are not limited to: * Term Identification for specialist domains using web corpora, as an initial step for ontology construction. * Extracting taxonomic and non-taxonomic relationships from the web. * Automatic ontology-based semantic annotation and Information Extraction of web content. * Mining semantic information from blogs, forums or news sources. * Automatic annotation in Semantic Wikis. * Integrating mined information with semantic resources. * Semantic annotation of multilingual web sources. * Burst detection from web sources. * Multi-webpage Named Entity Coreference * Usage scenarios for the combination of the Semantic Web, Human Language Techonologies, Text Mining, decision support, etc. IMPORTANT DATES: 1 August 2006 - Paper submission 5 September 2006 - Acceptance notification 18 September 2006 - Camera-ready papers 6 November 2006 - Electronic version of the proceedings available: SUBMISSIONS Paper submissions must be formatted in the style of the Springer Publications format for the Lecture Notes in Computer Science series, and submitted as PDF documents. We accept two kinds of papers: * Full papers, with a length limit of 10 pages. * Short position papers, with a length limit of 5 pages. In both cases, the names of the authors should not appear in the paper, in order to ensure a blind review process. At least one author must register for each accepted submission, for the paper to appear in the workshop proceedings. ORGANISING COMMITTEE (alphabetical ordering) Enrique Alfonseca - Universidad Autonoma de Madrid, Tokyo Institute of Technology. Thierry Declerck - DFKI GmbH, Germany. Manabu Okumura - Tokyo Institute of Technology. Satoshi Sekine - New York University. Hiroya Takamura - Tokyo Institute of Technology.

