ELSNews, vol. 4.4, August 1995


Permission is given by the Editor to distribute or re-use ELSNews articles appearing in the paper version of the newsletter, or in these Web pages, with the proviso that the following text is included with the re-used article:

``This article appeared in ELSNews 4.4 (August 1995), and is re-printed by permission from the Editor. ELSNews is the newsletter of ELSNET, the European Network in Language and Speech. Information about ELSNET is available from the Coordinator, at elsnet@let.ruu.nl.''


Table of Contents

ELSNET Site Coordinator to become ELRA Chief Executive
Khalid Choukri appointed to head resource distribution agency
Summer School Report
Student participants report on the 1995 summer school
Speech & NL Integration in Training
Workshop held in Saarbrücken to discuss the issues
Speaking from the Heart
Application-oriented research at Sheffield
ACSYS
French SME taking speech research to the marketplace
ELRA Update
Association appoints Chief Executive
EB Minutes
Summary of the minutes of the July ELSNET Executive Board meeting
Future Events and Misc

Industrial Placement Service to be Implemented Soon

Steven Krauwer, Utrecht University

Much of this issue of ELSNews is dedicated to issues related to training. Therefore it seems justified to use this first page to announce a new ELSNET service closely related to this area.

Those of our readership who regularly reread the technical annex to the ELSNET contract will no doubt remember Aim 5 listed under Human Resources Planning: industrial placements. Industrial laboratories joining ELSNET were (and still are) encouraged to provide trainee posts for students. Over the years a number of students from ELSNET sites have been working in industrial labs, but mostly on the basis of personal contacts. Since industrial traineeships are an excellent means to enhance the contacts between universities and industry (please refer to the article on pp. 8-9, which describes the experience of ACSYS in this area), we feel that ELSNET should not only play an encouraging, but also a facilitating role, especially since there is a limit to what personal contacts can do when the student's horizon is no longer the national frontier.

For this reason we are now setting up a service which will allow industrial members of ELSNET to announce the availability of trainee posts, with brief descriptions of the tasks to be carried out and the required qualifications, and which, at the same time allows students to publicize their willingness to work for a company as a trainee, their qualifications, and their main interests. The announcements (both the demand side and the supply side) will be put on our WWW pages, and a retrieval system will be made available. In addition we will offer companies (especially those that do not have access to the Web) a search service, where we will try to identify suitable candidates for them, if available. In order to have the service operational within a few weeks from now, we invite companies and students interested in traineeships to provide us with their announcements as soon as possible. The format will be free text, for the time being in plain ASCII but we may want to extend it to HTML. On the basis of the material we receive we will develop templates, which can be used to check the announcements for completeness. If there is sufficient interest we will also include normal job advertisements, and at the same time allow people looking for a job in the area of NLP and Speech to send in their CVs for inclusion. We will publicize the availability of the service as soon as it is operational. All those who send us their announcements will be notified personally.

FOR INFORMATION
All job advertisements and CVs should be sent electronically in ascii (plain text) format to elsnet-jobs@let.ruu.nl. Job advertisements may, as before, also be posted to elsnet-list@let.ruu.nl.


ELSNET Site Coordinator to become ELRA Chief Executive

At the end of May, several of ELSNET's industrial site coordinators were invited to submit an article to ELSNews describing the work of their company. One of the individuals to respond to this invitation, and who subsequently followed up by sending an article, was Dr. Khalid Choukri, from ACSYS, a French SME involved in speech processing. (See pp. 8-9).

At the end of June, Dr. Choukri was chosen, out of 22 applicants, to take on the new post of Chief Executive for the European Language Resources Agency, (see p. 9).

The editors of ELSNews would like to wish Dr. Choukri all success in his new role!


``He That Tholes, Overcomes''

Report on the Third European Summer School on Language and Speech Communication

Kay Berkling, Oregon Graduate Institute of Science & Technology;
Ruta Marcinkeviciene, Vytautas Magnus University; and
Carlos Jorge da Conceiçao Teixeira, INESC and Technical University of Lisbon

Today's world increasingly confronts us with a need to engage in multi-cultural interactions. We are seeing more and more that the knowledge of one or more foreign languages is necessary. At the same time, there have been enormous advances in technology which regularly deliver automated services into our daily lives. In an ideal world automated services should not discriminate on the basis of culture, language, pronunciation, spelling errors or topic of discussion. It is clear that there is great need for computers that can cope with the multilingual world we live in. Research in this field is now leading to applications that include both text and speech processing systems within and across languages. As a result, researchers must be well-versed in a wide range of areas from linguistics to signal processing.

Perhaps it is not surprising, then, that this year's European Summer School on Language and Speech Communication focused on the very ``hot'' topic of multilinguality. Starting on July 10th, fifty-five students from 22 countries around the world came to Edinburgh for the two week long programme. While a majority of the students came from Germany, Holland, Spain, Sweden, and the UK, other countries represented were: Bulgaria, Czech Republic, Republic of Georgia, Hungary, Italy, Korea, Lithuania, Latvia, Poland, Portugal, Romania, Russia, Slovenia, Sri Lanka, Switzerland, and USA. The group was nicely balanced, with 25 students having speech backgrounds, 23 having linguistic backgrounds, and 7 having backgrounds in both fields! Fifteen students came from industrial organisations and 40 were staff or PhD students from universities or academic research institutes; especially well represented was Philips Research Laboratories in Aachen, which sent 5 members of its research team. Over one hundred hours of lectures, presented by 13 lecturers, were distributed over 10 days.

Classes were scheduled to fill up the entire day. Fortunately, regular breaks, with coffee and cookies, refreshed us for the next session. An information table was available during all the breaks, with maps and sign-up sheets for events planned during the coming days.

During these two weeks, we proceeded to tackle the difficult task of bridging some of the gaps dividing us. English served as a common language for communication which was not necessarily easy for all of us. One might think that there would have been a gap between lecturers and students, but approaching the lecturers was never a problem despite the fact that the students' varied backgrounds complicated the task of finding a common basis from which to teach.

Classes covered topics ranging from pure linguistics to pure engineering. Some lectures provided an overview of a field while others tended to be more specific. Overviews tended to be presented in plenary sessions. These focused on discussing language translation from two different points of view. While Martin Kay made us aware of the importance of human participation in the translation process, Alex Waibel presented a critical analysis of the machine's role. Steve Young's lectures, which gave a very clear overview of the use of HMMs, thus bridging the gap between linguists and speech researchers, was unfortunately not given in a plenary session. On the other hand, Annie Zaenen's afternoon plenary course presented purely linguistic lectures on language variability.

Mornings and afternoons were divided into optional courses, arranged so that people could choose to remain in their field of either speech or language. Unfortunately, one could not attend all courses, and sometimes the choices were tough. Classes covered a wide variety of subjects: multilingual speech synthesis, recognition, and text generation, language identification, software localisation, machine translation of closely-related languages, and corpus-based methods for multilingual applications.

It was always very important to take the background of the students into account. Teachers were encouraged to cover both aspects of speech and linguistics, thereby more or less bringing the two groups to a common understanding of the subject matter. Initiative on the lecturer's part to change the presentation style according to feedback from the students was vital. Björn Granström, Kjell Gustafson, and Mats Ljungqvist, along with their students, were able to present a lively discussion on multilingual speech synthesis which clarified both the linguistic and the speech aspects necessary to build a good synthesizer. Demonstrations of the systems were provided and were very helpful. Similarly, Ido Dagan's well-designed lectures outlined some statistical methods for language translation which were understood by students with both linguistic and speech backgrounds. Steve Young's week 1 introduction to HMMs, followed by Kate Knill's week 2 course on speech recognition using HTK, gave insight into details not readily available in textbooks. Even those already familiar with the field were able to increase their knowledge. Helge Dyvik was also able to successfully communicate to all students in the class. In addition, he accomplished the difficult task of explaining a complex system to the degree that we were able to work with it and gain some hands-on experience.

John Bateman's week 1 course on Multilingual Text Generation was probably too ambitious. Although the course was theoretically interesting to those with strong linguistic backgrounds, the generation software which we were given a chance to play with was very complex and often slow. A couple of teaching assistants would have been very useful in this course, as they would have allowed each student group to have more individual attention. Jim Hieronymus's week 2 course on Spoken Language Identification was also very interesting theoretically, but the hands-on part of the course would have benefitted from more structure. Gabor Ellmann's course on Software Localisation was an attempt to show how multilingual NL and speech research is being practically applied right now.

Practical aspects of the classes were very relevant to us. While it is interesting for students to learn about conceptual problems in the field of multilinguality, it is also necessary to be introduced to concrete methods. Facilities supplied by the Department of Computer Science at the University of Edinburgh were more than adequate for learning how to use software such as HTK, tools for multilingual data processing, translating closely-related languages, and automatic text generation.

During the second week, 23 students were given the opportunity to present their own work, thus exploring the difficult task of speaking to an audience with such a varied background! We suggest that, next year, rather than having each student give 15-minute talks, perhaps a poster session would be more practical and might give all students time to present their work in more depth. This format would also encourage increased interaction between the students concerning research issues.

After a full day of lectures, both professors and students could relax in the wonderful city of Edinburgh. A social programme which included several pub tours, a traditional Scottish ceilidh and a hiking tour of the Pentland Hills, contributed to our getting to know one other. We also had the fortune of being present for the arrival of the tall ships in Leith Harbour. We met the ghosts, heard the stories, and learned something of the history, of the Athens of the North, as Edinburgh is called. More importantly, the social events fostered a lot of interaction between the students and lecturers, and provided good opportunities to talk to each other about our own work. Most communication on this topic will most likely continue via email for a long time to come.

The two splendid weeks in Scotland ended with a fine dinner and a discussion about the possible future directions of multilingual speech and language processing. After that the organising committee was honoured with a short speech by one of the students and gifts were given to Dawn Griesbach, Bernie Jones, Mercè Prat and Yvonne van Holsteijn. We think we speak for all the students when we say that the summer school was a success. Organisation as a whole was very well done, including the WWW-pages provided during the weeks before our arrival, informing us about the social events,and giving detailed maps and instructions so that we would all arrive on time. The accommodation was quite nice, and was within walking distance of the classes.

During these two weeks, we realised the extent of our knowledge limitations. Nevertheless, we remind ourselves of a quotation inscribed above a doorway in Edinburgh's Old Town, which says: ``He that tholes, overcomes.'' ``Tholes'' --- for those of you who don't speak Scots --- means ``to perservere or endure.'' In the future, not only will we need to pool knowledge from linguists and speech technologists, but we will --- each one of us --- need to try to acquire a more well-rounded education in order to facilitate this communication. At the end of these two weeks in Edinburgh, we walked away with a good idea of the general direction in which the field of language and speech communication is heading. We all look forward to next year in July when the ELSNET Summer School will be held in Budapest on the topic of ``Dialogue Systems.''

FOR INFORMATION
This year's European Summer School on Language and Speech Communication received major funding from ELSNET, ERASMUS and the University of Edinburgh. Support was also provided by EACL and ESCA.

In addition, the local organising committee is extremely grateful to Entropic Research Laboratory and SUN Microsystems for their generous donations of software and hardware. All of the computers used in the Summer School were loaned by the Dept. of Computer Science (DCS) at the University of Edinburgh, with computing support provided by Dave Baines, a member of the DCS staff. Many other members of the computing staff at DCS assisted in large and small ways; the Summer School would not have been possible without them.

Next year's Summer School will be held from July 15-26 in Budapest, Hungary. The topic will be ``Dialogue Systems,'' and the members of the programme committee are Niels Ole Bernsen (Roskilde University), Norman Fraser (Vocalis, Ltd.), and Klara Vicsi (Hungarian Academy of Sciences). Watch this space for a preliminary announcement in the autumn!


Training in NL and Speech: Integrated Individuals or Integrated Teams?

Workshop Held in Saarbrücken to Consider the Issues

Yvonne van Holsteijn, Utrecht University

Introduction

From an academic point of view it seems quite natural to look upon integration of natural language and speech (NL&SP) technology as an intellectual challenge. Although this view may be shared by industry, the motivation for integration from an industrial point of view lies mainly in finding profitable solutions to problems related to building integrated NL&SP applications. Nevertheless, it is nowadays widely acknowledged, both in industry and universities, that building NL&SP applications requires knowledge that goes beyond the traditional fields of NL&SP. Not only is knowledge of themes from overlapping NL&SP areas required, but also scholarship in computer science, cognitive science, and ergonomics is considered to be at least equally important. It is also commonly accepted that the complex, multi-disciplinary task of building NL&SP applications can not be handled by a single individual, but rather requires a team of experts. Given this, the main question still to be answered with respect to integration in training is to what extent (and how) individuals should be trained in an integrated way. Do we want the result to be individuals with integrated skills, or do we prefer to have integration take place at the team level?

This was the central question at the ELSNET Workshop on Integration of Language and Speech in Training, which took place in Saarbrücken, Germany, on May 19 and 20, 1995.

Twenty people from 8 different countries (France, Germany, Hungary, Netherlands, Spain, Sweden, UK, and USA) discussed the need for and possible implementation of an integrated approach to NL&SP in training.

``Integrated individuals''

One way of educating individuals with ``integrated skills'' is to offer them an integrated 4-year curriculum. In his presentation, Bert Cranen from Nijmegen University argued in favour of this approach. It enables the individual not only to understand the big picture of an integrated NL&SP project, but also to communicate in an efficient way with his fellow research team members.

Cranen also discussed several, more or less inherent, disadvantages to the full-curriculum approach. A fully integrated curriculum forces each individual to take courses in language technology, speech technology, and computer science, thus resulting in a rather inflexible, tough programme, very much focused on basic mathematical and computational techniques. Experience has shown that a formal curriculum packed with engineering and physical science classes can scare away students who have no affinity to these subjects, but who would, nevertheless like to specialise in language-oriented topics. In Nijmegen they tried to overcome this problem by providing a lot of practical experiments during the first obligatory phase of the curriculum, thus creating the necessary concepts without much formal mathematical treatment. This approach has proven to work fairly well.

Secondly, there is a risk of educating `Jacks of all trade and master of none'. No matter how important it is to offer courses in all relevant areas, it is impossible to educate `multi-area experts' in a period of 3 to 4 years. Cranen argued that it is possible to reduce this problem to acceptable proportions by finding a proper balance between broad and in-depth knowledge.

Thirdly, there is the practical problem inter-departmental cooperation. Courses offered by `other' departments are generally not oriented towards language and speech, and this situation is not likely to change, as long as the inflow of NL&SP students is relatively low.

``Integrated teams''

The opposite approach was presented by Annie Zaenen (Xerox, Grenoble). She argued in favour of educating specialists in either language technology or speech technology, and to also focus on providing the basic skills needed for team work. She argued that non-content skills such as collaborative spirit, ability to communicate, good working methods and discipline, and personality traits like creativity, imagination, curiosity, flexibility, and motivation are at least as important and often more important than knowledge of common techniques and terminology for successful NL&SP application building. She qualified this statement by staying that in general, the skills and personality traits one needs are different for basic research activities and integrated application building. One needs minimal programming skills and high theoretical creativity; the other needs the opposite. Whereas at the basic research level, results may come from `unintegrated individuals', the development of applications requires team work and team work related skills. Zaenen concluded that there is `no unique profile of a researcher', and thus `no unique curriculum'.

The presentations by Cranen and Zaenen provided a excellent kickoff for further discussion in four working groups with the following tasks:

A 4-year curriculum in language and speech

The working group (WG) proposed a curriculum which included two years of basic-level courses, balanced between language, speech, and common techniques. This would be followed by two years of advanced courses in language, speech, language-speech-overlap themes, and optional courses from other fields. In the advanced phase, students would be free to specialise either in language, speech or a mixture of both (with each specialisation including overlap-area courses).

Depending upon the specialisation, the proposed curriculum would result in graduates who would be either experts in speech-language-technology (`highly integrated individuals'); or experts in language technology with knowledge about problems and methods in speech processing; or experts in speech technology with knowledge about problems and methods in NL processing.

The presentation was followed by an energetic discussion which resulted in several modifications to the proposed curriculum. Inspired by Zaenen's conclusion that there is `no unique curriculum' since there is `no unique profile of a researcher', it was suggested to offer, in the advanced phase, not only the possibility of specialising either in language, or speech, or a mixture of both, but also the possibility of focusing on basic or application-oriented research.

The second suggestion was to give the opportunity to start the specialisation phase after one instead of two years. The result would be a better balance between basic and in-depth knowledge, more flexibility, and a reduction in the motivation problems that crop up in the first tough phase.

It was remarked that increasing the inflow of students would only make sense if there is a market out there waiting to employ them. Although several problem areas exist that need both language and speech methods for their solution, the WG wondered whether the set of problems which require these methods is large enough to justify a full curriculum. On the other hand, if the lack of integration between language and speech is considered to be a sociological problem, one might argue in favour of `early integration'.

A one-year stand-alone MSc course

In structuring a one-year MSc, roughly two approaches can be taken: 1) Produce a course aimed at students with one specific background (computer science, speech technology, or computational linguistics), and give them grounding in `the other areas'; 2) Produce a course aimed at students from a variety of backgrounds who want to work in the NL&SP area.

For reasons of efficiency (one course instead of several different courses), the WG implemented the latter approach. The course structure would allow students to choose a programme suited to their background and their future needs. A number of so-called ``blocks'' were proposed, each block providing four courses related to a single area, e.g., speech theory, language applications, multimedia, etc. Students would be required to follow 3 blocks (one per term), and three out of four courses for each block, followed by three months of research work.

The WG hesitated to open the course to students with `less relevant' backgrounds (general linguistics and phonetics), since it would be impossible to provide sufficient grounding in other areas within one year. It was felt that constraints should be set on the possible combinations of courses related to the students' backgrounds (to avoid `easy-rides').

The discussion raised the observation that the people who tend to end up in language and speech technology very often have engineering backgrounds. In addition, it seems that the overlap-area between the so-called `engineering skills' and industrial needs is strikingly bigger than the overlap-area between `language and speech skills' and industrial needs. One could conclude from this, that a one-year MSc in NL&SP for students with engineering backgrounds would produce students with excellent career prospects.

Courses for industry

In general, a company will train and retrain its personnel because the knowledge learned at the university is soon outdated, and staff are often confronted with tasks they are not trained for. Opportunities which some industrial personnel take advantage of include tutorials at conferences, summer schools, ESCA workshops, and European mobility programmes. Although summer schools are often very much appreciated by industry, they are not generally organised with `industrial needs' in mind. In addition, the average duration of a summer school is far too long for many industrial employees.

Within the ELSNET community, there are numerous universities and research centres that have the expertise and the facilities to offer interesting `bullet courses' for industry. Teaching and providing course material is what universities are there for in the first place, and it is what they are good at. It was concluded that the best thing ELSNET could offer industry in terms of training is:

  1. Overview courses for non-specialists. Possible topics: speech in user interfaces, new directions in linguistically based document technology, new approaches to machine aided translation etc. Duration: 1 to 2 days.
  2. Special topic courses for continued specialised training and retraining. Examples of possible topics were categorised in 5 broad classes, namely, computational methods and techniques (e.g. HMM); linguistic areas and theories (e.g. syntax and prosody); speech and language technology areas (e.g. writing systems); special application types (e.g. message sorting and routing); and relevant methods and techniques from outside LT (e.g. DBMS). Duration: 2 to 3 days.

Book(s)

In view of the theme and aim of the workshop, the WG considered a single book more appropriate than two separate books. The book should be a series of chapters covering themes like phonology and speech processing, and the asymmetry between speech recognition and synthesis. It was emphasised that these topics were just examples, to illustrate the level the chapters should be pitched at.

Each chapter should have two authors, one expert from each side of the speech-language barrier. It was proposed to organise the chapters according to the traditionally recognised fields in language and speech. It was not entirely clear, however, how the overlapping themes would fit into this scheme.

It was felt that the book should not be considered as a textbook, but rather an exploratory book, since the integration of language and speech is still very much in an exploratory phase.

With this in mind, Mike Johnson (Edinburgh) proposed to adopt a novel way of presenting the information that would naturally solve the problem of how to include integrated themes, namely, a book consisting of transcribed discussions between the two authors. To avoid an open-ended discussion, each chapter should be structured around a number of themes which the authors should address. Issues unclear to the `expert of the other side', should be explained in a readable manner by the `host-expert'.

Follow-up activities

It was agreed to organise follow-up activities in all four working group topics. For the full curriculum and the one-year MSc, ELSNET will try to bring together a number of sites with the motivation and the required facilities to participate in these initiatives. After this initial phase, detailed follow-up activities will be outlined in consultation with the parties involved. With respect to setting up an MSc, funding possibilities under and Socrates will be explored.

A call for proposals will be distributed to all ELSNET institutes, asking that each site come up with a well defined plan for one or more industrial courses. In principle, the fee should cover the costs, but initially, ELSNET could provide guarantee subsidies to the first institutes who want give the idea a try.

As for the exploratory book, ELSNET will try to find an editorial team interested in the idea of getting this book off the ground. This team should further work out the idea and find one or more editors. The book will be published in the ELSNET series.

Conclusion

There was broad agreement amongst the workshop participants that there is a clear need for 'specialists' (in computer science, speech technology and computational linguistics), with sufficient grounding in `the other areas' to work efficiently within a multidisciplinary team. The opinions differed as to what educational approach would result in graduates with these skills, but no attempt was made to come to a general conclusion preferring one approach above the other.

It was beyond the scope of the workshop to give a detailed operational definition (e.g., in terms of course contents) of ``sufficient grounding''. It is not unlikely, however, that such an operational definition would automatically point towards one or the other approach towards integration of NL&SP in training.

FOR INFORMATION
A written report on the Saarbrücken workshop will soon be available. If you would like a copy of this report, contact:
Yvonne van Holsteijn
ELSNET
Fax: +31 30 253 6000
Email to elsnet@let.ruu.nl

If you are interested in participating in any of the workshop's follow-up activities, please contact:
Gerrit Bloothooft
OTS, Utrecht University
Trans 10
3512 JK Utrecht, The Netherlands
Tel: +31 30 253 6042
Fax: +31 30 253 6000
Email: bloothooft@let.ruu.nl


Application-Oriented Research

One of the primarly goals of ELSNET is to take steps towards bridging the gap between academic research and industrial product development. With this in mind, there will be a series in the next few issues of ELSNews describing some recent examples of NL and speech research which has successfully been developed into a product or service that is currently being used by industry. If your institute is doing such work, or if you have a system at your institute which you believe could potentially be developed into a product in the very near future, please let us know about it! Send mail to elsnews@cogsci.ed.ac.uk.

Heart-Felt Synthetic Speech

Steve Beet, Electronic & Electrical Engineering Department, University of Sheffield

The general areas of speech research covered by the Electronic Systems Group of our department are speech analysis, processing, recognition and synthesis. In each area, the primary problems we are addressing are robustness, perceptual significance, and computational tractability and efficiency.

In particular, we have investigated the exploitation of auditory principles for `environmentally robust' automatic speech recognition, adaptation algorithms for various forms of artificial neural networks (for use in speech and other time-series modelling applications), and the augmentation of speech data with visual cues. However, the largest project in which we are participating, concerns speech synthesis.

Voices, Attitudes and Emotions in Speech Synthesis (VAESS)

VAESS is funded by the EC TIDE Programme. The aims of the project are:

  1. To develop improved quality and range of synthetic voices in a number of languages and on a number of synthesisers including a small and portable, yet powerful, voice-output communication aid (VOCA), of the form shown below.
  2. To simplify and automate the provision of new voices, male or female, to suit the personality and preferences of the user.
  3. To include a range of attitudes and emotions in the synthesised speech, and to provide efficient user control of these features.
The hardware platform is based on a 486 processor, is PC-compatible, and includes standard SoundBlaster(TM) hardware. Thus the algorithms developed will also function on other standard PC platforms.

During the ``training phase'', the user and his or her carer will work together to develop a new personalised voice. Subsequently, during the ``operating phase'', the user will operate the system alone, for day-to-day communication. To ensure that the final system is useful to (and usable by) the disabled and the elderly, the user interface will be designed to be easy for computer-illiterate people to operate. Similarly, it is important that the automatic methods used to analyse the prototype speech are reliable and do not need specialist intervention.

Overall, the structure of the project will include the following steps:

  1. Study and manual labelling of speech features for voices, attitude and emotion.
  2. Development of automatic or semi-automatic labelling of speech features and conversion into text-to-speech control parameters.
  3. Hardware platform development for operation with and without DSP.
  4. Development of the user interface.
  5. Integration of the systems.
  6. Evaluation of the systems and completion of the user interface.
FOR INFORMATION
Further information on the VAESS project can be obtained from:

Dr. Peter A Cudd
Department of Speech Science
University of Sheffield
18 Claremont Crescent
Sheffield, S10 2TA, UK
Email: p.cudd@sheffield.ac.uk


ACSYS: Making Speech Technology Accessible in the Marketplace

Khalid Choukri, ACSYS

Background

ACSYS is a French SME actively involved in speech processing. The company was established in 1986 to supply speech and digital signal processing products to telecommunications users, and its main activities (in addition to civil and military telecommunications) include speech recognition, coding and synthesis, and engineering and manufacturing of electronic equipment, including on-board systems. ACSYS is best known for its speech products including: music on hold, announcement systems, voice mail (software and hardware), automatic dialling systems, recording from multiple channel (to keep track of communications), and Interactive Voice Response (IVR) systems based on DTMF (Dual Tone Multi Frequency signalling) and/or speech recognition. Since April 1994 the main shareholders of ACSYS are France-Telecom and MACIF, a major French insurance company.

In 1987, ACSYS developed the first French speech recognition board incorporating a telephone interface, based on CNET algorithms (CNET is the Centre Nationale d'Etudes en Telecommunications, the France-Telecom Research Center). Since then, ACSYS has acquired CNET's new speech recognition algorithms, called PHIL90, and has designed a board which makes use of them (called DIALSYS-TURBO).

Current R&D Activities

ACSYS devotes more than 20% of its turnover to R&D activity. The goals of R&D are twofold:

ACSYS has around 60 staff, including 12 technical people involved in R&D. Since April 1994,a new strategy has been set up and the company's activities are oriented towards speech-based technologies which include:

ACSYS produces the hardware which supports its speech recognition technologies and develops basic software such as drivers and Application Programming Interfaces (APIs) supplied to various application developers. ACSYS also develops pilot applications which demonstrate the performance of its ``speech activatedS interactive servers and the user-friendly way in which human-computer oral dialogues are handled.

ACSYS can provide its customers with two baseline systems, one of which represents the speech recognition technology of the 90s (low cost) and a new generation which represents the state of the art. The two systems are based on CNET algorithms and are usable with dedicated add-on PC boards designed and manufactured by ACSYS with a single analogue interface. This technology is being ported to new high-density boards (8 or 16 channels per board).

The speech recognizer used in both systems is based on a Hidden Markov Models approach. A robust speech model allows the systems to operate in noisy environments and for all speakers --- i.e., they are speaker and environment independent systems. A user-friendly dialogue structure is designed to cooperatively support both casual and regular users. A ``cut through'' procedure allows the experienced user to interrupt the system's outgoing speech at any time, using key words. This speeds up their navigation through the proposed choices of the menu driven dialogue. The casual user is guided through the menus in a way which permits him or her to recover from errors; the system starts with an open dialogue and, in case of errors, focuses on single items by asking yes/no questions.

The speech recognizer uses a word spotting technique to identify key words in casual speech. This is a pre-requisite for most spoken language applications since callers often ignore the instructions and prompts given by the system to use single and isolated words. Spotting words within a stream of speech also allows the system to cope with paralinguistic factors such as hesitation (``Euh Yes'' instead of ``Yes''), polite styles (``Yes, please'', instead of ``Yes''). This approach allows flexible and unconstrained speech input without the complexity of real linguistic parsing.

The keyword spotter searches for expected words within its lexicon. If the user utters words which are not in the lexicon and not related to the services offered by the system, these ``words'' are rejected. A ``garbage'' model is then used and trained from words outside the system's vocabulary and other extraneous speech sounds. Both the ``keywords'' model and the ``garbage'' model are then used to interpret the incoming speech from the user.

This work allowed us to install the first ``pioneer'' application for MACIF (a world-wide premiere) which combines the features mentioned above (cut-through, word-spotting, rejection of out-of-vocabulary words). The system receives more that 500,000 calls/year and is used in more than 10 different sites. Other applications have been developed for customers not only in France but in other European countries as well.

Technology Transfer and Marketing Strategy

ACSYS is frequently involved in the supervision of Masters and PhD students from universities and engineering colleges in and around Paris. Student or engineers involved in this exchange carry out part of their work in the research lab and then conduct a ``technology transfer'' to ACSYS.

Aside from our involvement in supervising students, in our experience, we have found that the most effective means of achieving technology transfer is by setting up a team on a specific topic with an academic research lab or an industrial partner. ACSYS participates in many calls for proposals or tenders as a member of larger consortia, in order to capitalise on academic experience and know-how.

Furthermore, as an SME, ACSYS cannot easily penetrate foreign markets and thus partnerships are set up whenever possible with foreign companies of comparable size. European Community-sponsored programmes encourage this sort of collaboration, and the objectives of ACSYS within such programmes are to shorten the period of time for deployment of new services to between six months and one year after the end of the project.

ACSYS is very interested in the Eastern European market although no specific steps have been taken in this direction yet. Our current strategy is aimed at efficiently penetrating the Western European market. Our marketing analysis has shown that the most attractive markets, ranked with respect to the expected size, are Germany, France, Italy, Spain and so forth. The applications requested by these markets include the VoiceMail and IVR systems. Because of its telecommunications infrastructure --- very low DTMF penetration --- Germany, in particular, is willing to take steps toward adopting speech recognition technology for accessing such services. Due to its cultural fascination with all things ``high-tech'' and because of encouragement from France-Telecom, the French market demands speech recognition technology to be present in any human-computer interactive system. There are also plans to take these systems into the British and North American markets.

As for many high-tech enterprises, ACSYS bases its success on scientific approaches practised by a highly-qualified team. Company projects are managed according to industrial project management standards. For each project ACSYS undertakes, a clear, strong management method is set up to guarantee that the project goals are achieved. The management method is based on an in-house ``Quality Plan'' inspired by standard quality plans which are adapted to ACSYS' size and activity. The Quality Plan describes the project environment, its organisation, life-cycle phases, the standards used, and document references. Project reviews --- internal and external --- are conducted regularly according to the method.

For ACSYS, it is of paramount importance to set up co-operative links with other actors in the human-computer communication area. We see ELSNET as a forum for making contacts and a means for bridging the gap between research and industry. ELSNET should play the role of encouraging fruitful collaboration across countries and across disciplines (e.g., speech and natural language).

FOR INFORMATION
While Khalid Choukri has accepted the post of ELRA Chief Executive, he will continue his association with ACSYS. If you would like more information about ACSYS, please contact him at:

Parc de l'Esplanade
5 Rue P.H. SPAAK
F-77462 Saint Thibault des Vignes
France
Tel: +33 1 64 12 66 66
Fax: +33 1 64 12 66 67
E-mail: choukri.acsys.croisix@gmail.gar.no


ELRA Appoints Chief Executive

Interviews were held on June 29 for the position of Chief Executive for the European Language Resources Association (ELRA). A panel of six interviewed nine of 22 applicants, and unanimously agreed to invite Dr. Khalid Choukri from the French SME, ACSYS, to accept the job.

Dr. Choukri obtained a degree in Electrical Engineering in 1983, from Ecole Nationale de l'aviation Civile (ENAC, Toulouse, France). He received his Masters degree in Computer Sciences and Signal Processing, from Ecole Nationale Superieure des Telecommunications (ENST, Paris) in 1984, and his PhD from the same university in 1987. His PhD thesis was on the topic of speaker adaptation methods in speech recognition systems.

Following his PhD, Dr. Choukri joined the research team in the Signal Department of ENST, working in the area of human-computer interaction. At the same time, he acted as an independent consultant on large vocabulary speech recognition, speech variability problems, speech recognition assessments and applications of neural networks to speech processing.

In 1989, he joined Cap Gemini Innovation, which is the R&D center of Cap Sogeti, one of the major French software houses, to work as the team leader on speech processing, oral dialogues and neural networks. He moved to ACSYS, in September 1992 as the speech technologies manager, in charge of supplying speech recognition packages to application developers and system integrators.

Dr. Choukri brings to his new post a wealth of experience in having worked with both spoken and written resources. He is well-known to members of the European speech community, in particular, and at the European Commission, because of the active role he has played in a number of European-sponsored projects. Moreover, according to one member of the panel who interviewed him, he is very enthusiastic about ELRA and has given a great deal of thought to the future of the Association. Brian Oakley, Chairman of the Intermin Steering Committee, said in a telephone interview, ``I am confident that the members of ELRA will be delighted with Khalid's appointment.''

The first ELRA General Assembly meeting will be held on September 25 in Luxembourg. At this meeting, the members of the Association will elect an Executive Board. The next issue of ELSNews will contain a report on the outcome of these elections.


Minutes of the July ELSNET Executive Board Meeting Summarised

The ELSNET Executive Board last met in Leuven on July 3-4. The main issue at this meeting was ELSNET's application for continuation after 1995. It was decided that ELSNET (as a non-funding body) will continue to act as an infrastucture for the natural language and speech community, and that the following areas of interest will be addressed under ELSNET-2:

Under ELSNET-2, the following lines of action for ELSNET's Task Groups (TGs) are foreseen. Please feel free to ask the contact people mentioned [email addresses shown in brackets] for more information on any of the following items.

Research: The Research TG will put special emphasis on `evaluation'. Its goals are: a) to set up one or more projects aimed at developing evaluation methods for systems involving both speech and language, typically dialogue systems; and b) building a European-wide infrastructure for the comparative evaluation of generic components [Steven Krauwer, elsnet@let.ruu.nl].

Training: The Training TG will continue to organise the annual ELSNET Summer Schools. The 1996 Summer School will be held in Budapast in co-operation with ELSNET goes East. The topic of the school is `dialogue systems' [Gerrit Bloothooft, gerrit.bloothooft@let.ruu.nl and Erik-Jan van der Linden, erikjan@fwi.uva.nl]. In addition, the TTG will focus on the following four activities [Gerrit Bloothooft, bloothooft@let.ruu.nl]:

Resources: Under ELSNET-2 the Resources TG will move in the following directions [written data: Ulrich Heid, uli@adler.ims.uni-stuttgart.de; spoken data: Lori Lamel: lamel@limsi.fr]:

Industry: The Industry task group will promote a continuous bi-directional flow of information between Industry and Academia via the organisation of events like the summer school, workshops, and bullet courses. Information dissemination via WWW, elsnet-list and ELSNews will be increased and improved (together with the Information Dissemination TG). Steps towards the creation of a brokerage service will be taken, and priority will be given to opening up and further exploiting places where research results are published to the industrial community [Anne de Roeck, deroe@essex.ac.uk].

Information Dissemination: One of ELSNET's most important roles is information dissemination; it is also this role which makes ELSNET most visible. The Information Dissemination TG will be responsible for consistency in presentation and making available information via WWW, elsnet-list and ELSNews. In co-operation with the other TGs, priority will be given to the improvement and extension of ELSNET's WWW-pages and publicity material. [Dawn Griesbach, dawn@cogsci.ed.ac.uk].

ELSNET and Eastern Europe: It is envisaged that ELSNET and ELSNET goes East, which is currently building up ELSNET's Eastern European counterpart, will ultimately merge into a pan-European network. This ambition is encouraged by the Commission. Meanwhile, however, the situation in the East is still very different from the situation in the West, and therefore, for the short-term, `ELSNET West' and `ELSNET East' will remain separate from an organisational point of view. ELSNET will undertake a follow-up application for ELSNET goes East under ELSNET-2. [Erik-Jan van der Linden, erikjan@fwi.uva.nl].

ELSNET Foundation: The creation of a legal entity ELSNET is in progress. It is foreseen that a `foundation ELSNET' will be operational before the end of this year. [elsnet@let.ruu.nl]

Date of next meeting: The next ELSNET EB meeting will be held on October 13 in London [elsnet@let.ruu.nl].


Fourth Framework Programme On-Line

Full texts of all workplans and documents related to the Fourth Framework Programme Calls for Proposals are now available on-line via the Web. This service is provided by CORDIS (Community R&D Information Service).

Information packages, background documents and details of work programmes for ESPRIT 4 and Telematics 2C, among other things, are available. So is everything else you ever wanted to know about the Frouth Framework Programme --- including the main research areas to be covered by the programme, sub-divisions of proposed funding, plans for implementation of the programme, and key publications. If you wish, you may even read an electronic copy of the Maastricht Treaty!

The European Commission has, it seems, joined the ranks of net-surfers.

FOR INFORMATION
The CORDIS welcome page is at:
WWW: http://www.cordis.lu/.
It is also available via the ELSNET home page:
WWW: http://www.cogsci.ed.ac.uk/elsnet/home.html


Telematics Applications Programme to Launch Third Call

The Language Engineering Sector of the Telematics Applications Programme is launching its third call for proposals on 15 September 1995. Closing date will be 15 January 1996. Priority assistance will be given to:

These projects will foster the consolidation of novel language technologies and their integration into multimedia information and communication products and services. RTD will aim at the next generation of telematics applications, and will feature focused, goal-oriented research efforts.

There is a specific effort in this Call to encourage proposals falling within the scope of and contributing to Global G7 Projects in the area of multimedia information access and management. These proposals are expected to bring together researchers, information providers and system integrators, and to co-operate closely with on-going and planned European and international initiatives in the field.

Further information about this call is available from the WWW by clicking here.

Outline proposals may be submitted using the forms contained in the Telematics Information Package, preferably by facsimile, as soon as possible but no later than 30 November 1995, to: European Commission, DG XIII-E-5, LE Office, Batiment Jean Monnet (B4-002), L-2920 Luxembourg, Fax: +352 4301 34999.

All specific inquiries regarding the LE Sector can also be obtained from this office.

FOR INFORMATION
General inquiries regarding the Telematics Applications Programme may be directed to:
Telematics Applications Programme
Call for Proposals, DGXIII-C
Rue de la Loi 200 (BU29, 4/41)
B-1160 Brussels, Belgium
Fax: +32 2 295 2354
E-mail: telematics@dg13.cec.be


Future Events

September 28-30, 1995: First Conference on Formal Approaches to South Slavic Languages. Plovdiv, Bulgaria. For information, contact Prof. Yordan Pencev/Maria Stambolieva, Institute for Bulgarian, Bulgarian Academy of Sciences, 52 Shipchenski proxod, bl 17, BG-1113 Sofia, Bulgaria, Email: jpen@bgearn.bitnet or mstamb@bgearn.bitnet.

October 16-18, 1995: Second Language Engineering Convention. London, UK. For information, contact Linda Prior, DTI, EED, 151 Buckingham Palace Rd, London SW1W 9SS, UK, Fax:+44 171 215 1966, or click here.

November 2-3, 1995: Second `SPEAK!' Workshop: Speech Generation in Multimodal Information Systems and Practical Applications. Darmstadt, Germany. For information, contact: John Bateman, GMD/IPSI, Dolivostr. 15, D-64293 Darmstadt, Germany, Email: bateman@gmd.de.

November 9-10, 1995: Translating and the Computer, Conference and Exhibition. London, UK. For information, contact Nicole Adamides, Fax: +44 171 430 0514, Email: pdg@aslib.co.uk, or click here.

December 1-2, 1995: Conference on Architectures and Mechanisms for Language Processing (AMLaP-95). Edinburgh, Scotland. Deadline for submission of papers: Oct. 18, 1995. For information, contact: Matt Crocker and Martin Pickering, Email: amlap@cogsci.ed.ac.uk, or click here.

December 6-8, 1995: First AMAST (Algebraic Methodology and Software Technology) Workshop on Language Processing. Enschede, The Netherlands. For information, contact: A. Nijholt, Email: anijholt@cs.utwente.nl.

March 7-9, 1996: Les langues et leur images. Neuchatel, Switzerland. For information, contact IRDP, Fbg de l'Hopital 43, CH-2007 Neuchatel, Tel: +41 38 24 41 91, Fax: +41 38 25 99 47.

April 11-12, 1996: Second ACM/SIGCAPH Conference on Assistive Technologies. Vancouver, Canada. Deadline for submission of papers: Oct. 17, 1995. For information, contact: David Jaffe, Dept. of Veteran Affairs Medical Center, 3801 Miranda Avenue, Mail Stop 153, Palo Alto, CA 94304, Email: jaffe@roses.stanford.edu.

August 5-9, 1996: International Conference on Computational Linguistics (COLING-96). Copenhagen, Denmark. Deadline for paper submissions: Dec. 15, 1995. For information, contact: Bente Maegaard, Email: bente@cst.ku.dk.

August 12-16, 1996: 12th European Conference on Artificial Intelligence (ECAI-96). Budapest, Hungary. Deadline for workshop proposal submissions only: Nov. 1, 1995. For information on ECAI workshops, contact: Elisabeth Andre, DFKI, Stuhlsatzenhausweg 3, D-66123 Saarbrücken, Germany, Email: ecai-96-ws@dfki.uni-sb.de, or click here.