Elsnet
 
   


ELSNET-list archive

Category:   E-CFP
Subject:   ACL-2003 Workshop on Patent Corpus Processing
From:   Priscilla Rasmussen
Email:   rasmusse_(on)_cs.rutgers.edu
Date received:   28 Mar 2003
Deadline:   10 Apr 2003
Start date:   12 Jul 2003

ACL 2003 Workshop on Patent Corpus Processing 12 July 2003, Sapporo, Japan CALL FOR PAPERS http:// www.slis.tsukuba.ac.jp/%7Efujii/acl2003ws.html ======================= Workshop Description ======================= The goal of this workshop is to foster research and development of the technology for patent corpus processing, by providing a forum in which researchers and practitioners can exchange and share their ideas, approaches, perspectives, and experiences from their work in progress. The processing of intellectual property (IP) documents, including patents, is important in the scientific, business, and law communities. Much of the focus for patent and IP processing has been in the database and information retrieval communities, but not in the computational linguistics (CL) and natural language processing (NLP) communities. In 2000, the first ACM SIGIR 2000 Workshop on Patent Retrieval was held. In this workshop, patent retrieval systems in use at EPO (European Patent Office) and JAPIO (Japanese Patent Information Organization) were introduced, and a number of issues related to patent retrieval (e.g., producing ontologies, cross-language retrieval, and evaluation methods) were proposed/discussed. In 2001-2002, the NTCIR workshop (the National Institute of Informatics, Japan), which is a TREC-style evaluation forum for research and development on IR/NLP, first performed the patent retrieval task. Two years of Japanese patents (approximately 7M documents published in 1998-1999; 18GB) were used to evaluate mono/cross-lingual patent retrieval systems. In addition, approximately 17M Japanese/English parallel patent abstracts were used to evaluate the effectiveness of extracting translation lexicons. ======================= Areas of Interest ======================= Patent corpora are associated with a number of interesting characteristics, for which various CL/NLP techniques have promise for improving the quality of patent processing. * multilinguality: the same/similar contents (i.e., inventions) are filed in different languages, for which machine translation, cross/multi-lingual retrieval, and translation extraction alleviate problems in accessing information in foreign languages. * scalability: a huge amount of copora data is available and periodically produced, for which text summarization and natural language generation help produce understandable coherent condensed contents. * complexity: since patents consist of overwhelmingly long sentences, parsing/chunking techniques help produce readable shorter fragments. * classification: patents are manually categorized based on a specific classification system, such as IPC (international patent classification), which can be used for statistical classification methods. * novelty/temprality/dynamism: new terms and concepts associated with inventions are periodically created, for which term extraction and ontology construction techniques help update lexical resources for patent processing. * document structures: unlike newspaper articles, patents are structured with a number of specific fields (e.g., titles, abstracts, and claims). While conventional text segmentation techniques rely mainly on linguistic contents (e.g., lexical chains), structure analysis techniques (e.g., ones related to XML) are also crucial in the context of CL/NLP. * applications: the above techniques can directly contribute to a number of applications, such as patent retrieval systems. We invite both research papers and project papers associated with, but not limited to, the rudiments of patent corpus processing listed above. We also invite papers addressing applications and user studies. ======================= Important Dates ======================= Submission deadline: 10 April 2003 Acceptance notification: 12 May 2003 Final version deadline: 30 May 2003 Workshop date: 12 July 2003 ======================= Workshop Chairs ======================= Makoto Iwayama, Tokyo Institute of Technology / Hitachi Ltd., Japan Atsushi Fujii, University of Tsukuba, Japan ======================= Contact Information ======================= Atsushi Fujii, fujii_(on)_slis.tsukuba.ac.jp University of Tsukuba, Japan
 

[print/pda] [no frame] [navigation table] [navigation frame]     Page generated 14-02-2008 by Steven Krauwer Disclaimer / Contact ELSNET