| Category: ||E-CFP |
| Subject: ||WEB CONTENT MINING WITH HUMAN LANGUAGE TECHNOLOGIES |
| From: ||Enrique Alfonseca |
| Email: ||Enrique.Alfonseca_(on)_uam.es |
| Date received: ||05 Jun 2006 |
| Deadline: ||01 Aug 2006 |
| Start date: ||05 Nov 2006 |
FIRST CALL FOR PAPERS
WEB CONTENT MINING WITH HUMAN LANGUAGE TECHNOLOGIES
workshop to be held at the 5th International Semantic Web Conference,
Athens, GA, U.S.A. November 5-9 2006
MOTIVATION, AIM AND SCOPE
With the large growth of the information stored in the World Wide Web,
it is necessary to have available tools for automatic or
semi-automatic analyzes of web data. Hence, a large effort has been
invested in the last years in developing techniques for extracting
patterns and implicit information from the web, a task that is usually
known as Web Mining. Web Mining itself can be divided into three
subtasks according to the kind of data that is collected: web
structure, web usage and web content.
Web content mining consists of automatically mining data from textual
web documents that can be represented with machine-readable semantic
formalisms. Initially, most web content mining systems used wrappers
to map documents to other data structures, but this is highly
dependent on the the layout and formatting instructions inside web
pages. Therefore, alternative approaches, that make use of Natural
Language Processing-based techniques, are increasingly used.
While more traditional approaches to Information Extraction from text,
such as those applied to the Message Understanding Conferences during
the nineties, relied on small collections of documents with many
semantic annotations, the characteristics of the web (its size,
redundancy and the lack of semantic annotations in most texts) favor
efficient algorithms able to learn from unannotated data. Furthermore,
new types of web content such as web forums, blogs and wikis, some of
them included in the so-called Web 2.0, are also a source of textual
information that contain an underlying structure from which specialist
systems can benefit. The workshop will give special emphasis to how
existing techniques can benefit from these kinds of contents.
This workshop aims at bringing together researchers from the Semantic
Web, the Natural Language Processing and the Text Mining
communities. The web constitutes a unique source of information to
train and exploit systems for tasks such as Named Entity
Identification and Classification, Term Identification, Relationships
Extraction, Ontology Learning and Population from text and Text
Mining. The Semantic Web community can contribute providing semantic
formalisms and tools for knowledge representation and reasoning for
exploiting the extracted metadata. The goal of the workshop is to
establish communication between all these communities.
TOPICS OF INTEREST
Topics of interest include, but are not limited to:
* Term Identification for specialist domains using web corpora, as
an initial step for ontology construction.
* Extracting taxonomic and non-taxonomic relationships from the web.
* Automatic ontology-based semantic annotation and Information
Extraction of web content.
* Mining semantic information from blogs, forums or news sources.
* Automatic annotation in Semantic Wikis.
* Integrating mined information with semantic resources.
* Semantic annotation of multilingual web sources.
* Burst detection from web sources.
* Multi-webpage Named Entity Coreference
* Usage scenarios for the combination of the Semantic Web, Human
Language Techonologies, Text Mining, decision support, etc.
1 August 2006 - Paper submission
5 September 2006 - Acceptance notification
18 September 2006 - Camera-ready papers
6 November 2006 - Electronic version of the proceedings available:
Paper submissions must be formatted in the style of the Springer
Publications format for the Lecture Notes in Computer Science series,
and submitted as PDF documents. We accept two kinds of papers:
* Full papers, with a length limit of 10 pages.
* Short position papers, with a length limit of 5 pages.
In both cases, the names of the authors should not appear in the
paper, in order to ensure a blind review process.
At least one author must register for each accepted submission, for the
paper to appear in the workshop proceedings.
Enrique Alfonseca - Universidad Autonoma de Madrid, Tokyo Institute of
Thierry Declerck - DFKI GmbH, Germany.
Manabu Okumura - Tokyo Institute of Technology.
Satoshi Sekine - New York University.
Hiroya Takamura - Tokyo Institute of Technology.
Elsnet-list mailing list