Category:   E-CFP
Subject:   Web as Corpus
From:   Adam Kilgarriff
Email:   adam.kilgarriff_(on)_itri.brighton.ac.uk
Date received:   18 Apr 2002
Deadline:   30 Apr 2002
Start date:   14 Jul 2002

http://www.itri.bton.ac.uk/%7 EAdam.Kilgarriff/wac_cfp.html FINAL CALL FOR PAPERS SPECIAL ISSUE of COMPUTATIONAL LINGUISTICS Web as Corpus Guest editors Adam Kilgarriff, ITRI, University of Brighton Gregory Grefenstette, Clairvoyance Corporation The Web is an immense, multilingual, freely available corpus. As with other large new corpora, computational linguists have been stimulated by its presence. Web research includes many of the most talked about papers of recent ACL and other meetings (eg Resnik, ACL '99; Brill, "Does the web change everything?", ACL SIGNLL '01). In comparison with most corpora studied to date, the web is heterogeneous and noisy. Methods for handling the noise, and extracting and exploiting subcorpora meeting particular criteria, are being developed by a widening population ranging from students who realise that it is an obvious place to obtain their corpus for free, to companies who seek to use HLT techniques on datasets other than the ones HLT researchers usually use. NLP can both give to, and take from, the web (distinction due to Dragomir Radev). It can give to the web technologies such as summarisation, MT and question-answering. But the giving side of the equation looks only at short-to-medium term goals. For the longer term, for 'giving' as well as for other purposes, a deeper understanding of the linguistic nature of the web and its potential for CL/NLP is required. For that, we must take the web itself, in whatever limited way, as an object of study, and uncover what it has to tell us about the nature of language. The Special Issue will focus on how we can use the web, rather than how we can help web users. The issues which we will expect Special Issue papers to cover include: Lexical data derived from the Web Classifying Web language; the range of text types on the Web Mapping Web documents onto existing ontologies; implications for ontologies Clustering in an open corpus The multilingual Web as a resource for translation CL/HLT engagement with the Semantic Web Papers should meet the usual criteria for CL; we expect most submissions to be short papers (up to 15 journal pages, ca 4000 words) but long papers (15--30 pages, ca 8000 words) are also permissible. SCHEDULE Papers due: 30 April 2002 SUBMISSION PROCEDURE Submissions may be either hard copy or soft copy. Soft copy submissions must meet Computational Linguistics specifications, see CL formatting instructions at http://www.itri.bton.ac.uk/%7 EAdam.Kilgarriff/cl-format.txt and are to be sent to Adam.Kilgarriff_(on)_itri.b righton.ac.uk. For hard copy submissions, seven copies are to be sent to Adam Kilgarriff Web as Corpus Special Issue ITRI University of Brighton Lewes Road Brighton BN2 4GJ United Kingdom In this case authors are also requested to submit a soft copy, in ps, pdf or rtf, to Adam.Kilgarriff_(on)_itri.b righton.ac.uk. Questions about submissions should be directed to the two Guest Editors, rather than the Journal or Publishing Editors.

