ELSNET-list archive

Category:   E-Material
Subject:   New Corpus from the LDC
From:   Linguistic Data Consortium
Email:   ldc_(on)_ldc.upenn.edu
Date received:   08 Dec 2003

* LDC2003T15 * * SLX Corpus of Classic Sociolinguistic Interviews * The Linguistic Data Consortium (LDC) is pleased to announce the availability of the SLX Corpus of Classic Sociolinguistic Interviews The SLX Corpus of Classic Sociolinguistic Interviews contains 8 sociolinguistic interviews with a total of 9 speakers. William Labov and one of his students conducted the interviews in the 1960s and 70s. These interviews represent solutions to the problems of achieving cross-cultural contact, reducing the effect of the Observer's Paradox and approximating the vernacular of everyday life. The corpus includes the complete interview recordings plus time-aligned verbatim transcripts for each speaker. Also included in the publication is a sociolinguistic variable survey that represents an overview of the intra- and inter-speaker variation attested in the corpus, highlighting a broad range of phonological, phonetic, grammatical, lexical and stylistic variables. Finally, the publication includes a number of annotation tools that allow users to listen to each interview while browsing the corresponding transcripts, and to display and hear each token identified in the variable survey. The SLX Corpus was developed as part of the Data and Annotations for <http://www.ldc.upenn.edu/P rojects/DASL> Sociolinguistics (DASL) Project, an investigation of best practices in the use of digital speech corpora for the study of language variation. The recordings demonstrate successful interviewing techniques, the sound quality is high, and the digitization, segmentation and transcription of the data represent best practice in these areas. The variable survey highlights over 150 sociolinguistic variables attested in the corpus and suggests avenues for further research. Most importantly, the SLX Corpus provides both an example of a digital speech corpus developed specifically to support sociolinguistic research, and a stable benchmark for training in sociolinguistic data collection, digitization, segmentation, transcription, analysis and publication. The SLX Corpus contains 17 speech files (22050Hz, 16 bit, single-channel in the MS WAV (RIFF) format), for a total of 575 minutes (%7E 1.5GB). The data is distributed on DVD-ROM. For further information, including online documentation, please visit: http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC200 3T15 <http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC200 3T15> The cost of the first 100 copies of this publication (not including the copies distributed to LDC members) is covered by NSF Grant Number BCS-998009, and therefore free of charge. After these first 100 copies are distributed, additional copies will be available for the production cost of $100 per disc. If you need additional information before placing your order, or would like to inquire about membership to the LDC, please send email to ldc_(on)_ldc.upenn.edu> <ldc_(on)_ldc.upenn.edu> or call 1 (215) 573 1275. ----------------------------------------------------------------------- Linguistic Data Consortium Phone: 1 (215) 573-1275 University of Pennsylvania Fax: 1 (215) 573-2175 3600 Market St., Suite 810 email: ldc_(on)_ldc.upenn.edu ldc_(on)_ldc.upenn.edu> Philadelphia, PA 19104-2653 www: http://www.ldc.upenn.edu <http://www.ldc.upenn.edu>

