ELSNET-list archive

Category:   E-CFP
Subject:   CFP, Special Session at Interspeech 2007, Structure-Based and Template-Based ASR
From:   Helmer Strik
Email:   W.Strik_(on)_let.ru.nl
Date received:   27 Feb 2007
Deadline:   23 Mar 2007
Start date:   28 Aug 2007

(apologies for multiple cross-posting) --------------------------- Call for Papers Submission deadline: 23rd March Special Session at INTERSPEECH 2007, Antwerp, Belgium: Structure-Based and Template-Based Automatic Speech Recognition - Comparing parametric and non-parametric approaches While hidden Markov modeling (HMM) has been the dominant technology for acoustic modeling in automatic speech recognition today, many of its weaknesses have also been well known and they have become the focus of much intensive research. One prominent weakness in current HMMs is the handicap in representing long-span temporal dependency in the acoustic feature sequence of speech, which, nevertheless, is an essential property of speech dynamics. The main cause of this handicap is the conditional IID (Independent and Identical Distribution) assumption inherit in the HMM formalism. Furthermore, in the standard HMM approach the focus is on verbal information. However, experiments have shown that non-verbal information also plays an important role in human speech recognition which the HMM framework has not attempted to address directly. Numerous approaches have been taken over the past dozen years to address the above weaknesses of HMMs. These approaches can be broadly classified into the following two categories. The first, parametric, structure-based approach establishes mathematical models for stochastic trajectories/segments of speech utterances using various forms of parametric characterization, including polynomials, linear dynamic systems, and nonlinear dynamic systems embedding hidden structure of speech dynamics. In this parametric modeling framework, systematic speaker variation can also be satisfactorily handled. The essence of such a hidden-dynamic approach is that it exploits knowledge and mechanisms of human speech production so as to provide the structure of the multi-tiered stochastic process models. A specific layer in this type of models represents long-range temporal dependency in a parametric form. The second, non-parametric and template-based approach to overcoming the HMM weaknesses involves direct exploitation of speech feature trajectories (i.e., 'template') in the training data without any modeling assumptions. Due to the dramatic increase of speech databases and computer storage capacity available for training, as well as the exponentially expanded computational power, non-parametric methods using the traditional pattern recognition techniques of kNN (k-nearest-neighbor decision rule) and DTW (dynamic time warping) have recently received substantial attention. Such template-based methods have also been called exemplar-based or data-driven techniques in the literature. The purpose of this special session is to bring together researchers who have special interest in novel techniques that are aimed at overcoming weaknesses of HMMs for acoustic modeling in speech recognition. In particular, we plan to address issues related to the representation and exploitation of long-range temporal dependency in speech feature sequences, the incorporation of fine phonetic detail in speech recognition algorithms and systems, comparisons of pros and cons between the parametric and non-parametric approaches, and the computation resource requirements for the two approaches. This special session will start with an oral presentation in which an introduction of the topic is provided, a short overview of the issues involved, directions that have already been taken, and possible new approaches. At the end there will be a panel discussion, and in between the contributed papers will be presented. Submission: Researchers who are interested in contributing to this special session are invited to submit a paper according to the regular submission procedure of INTERSPEECH 2007, and to select 'Structure-Based and Template-Based Automatic Speech Recognition' in the special session field of the paper submission form. The paper submission deadline is March 23, 2007. Session organizers: Li Deng <deng_(on)_microsoft.com> Helmer Strik <strik_(on)_let.ru.nl> Information about this special session can also be found at the following websites: http://www.interspeech2007.org/Technical/structure_template_based_asr.php http://lands.let.ru.nl/~strik/IS2007-Special_Session-STB_ASR.html _______________________________________________

[print/pda] [no frame] [navigation table] [navigation frame]     Page generated 14-02-2008 by Steven Krauwer Disclaimer / Contact ELSNET