ELSNET-list archive

Category:   E-Material
Subject:   WMTrans Language Processing Tools Available
From:   Sandra Wendland
Email:   sandra.wendland_(on)_canoo.com
Date received:   28 Nov 2002

Editorial Department: Software, Information Retrieval, Natural Language Processing, Language Learning FOR IMMEDIATE RELEASE Press Release WMTrans Language Processing Tools Available German Word Analysis and Generation for more than Two Million Words Basel, Switzerland, November 25, 2002. Canoo Engineering AG today announced the release of its Word Manager Transducer (WMTrans) product range. Available through the Web at http://www.canoo.com/wmtrans, the German morphology analysis software, developed by Canoo, offers intelligent text processing for information retrieval and language processing applications. Typical use cases include intelligent search, text indexing, text mining, language learning, hyperlink generation, spell checking, grammar checking, and machine translation. WMTrans is based on Canoo's German Morphological Dictionary, containing more than 200'000 entries and generating over two million fully categorized word forms, including information on word formation, all types of inflectional irregularities and spelling variants. The WMTrans product range includes the following software components: WMTrans Lemmatizer The Lemmatizer analyses German words and finds their base form and category. An analysis of ging, for example, returns the infinitive verb, gehen, the corresponding base form listed in a dictionary. query -> ging result -> gehen (Cat V) WMTrans Unknown Word Lemmatizer In German, complex new words can be formed easily - either by compounding or by adding pre- and suffixes. Examples of German compounds are words like Umsatzwarnungen, skandalgesch=FCttelten, or abbausicheres. Though widespread, many compounds have a low frequency and are not listed in dictionaries. The Unknown Word Lemmatizer recognizes non-lexicalized words such as compounds by applying word formation rules. This is a powerful advantage in a generative language such as German. The Unknown Word Lemmatizer includes the Lemmatizer and therefore knows both the entire dictionary and the word formation rules. Typical usage is as follows: A first call to the Lemmatizer determines whether or not a word form is included in the Morphological Dictionary. If the Lemmatizer does not find the word, it is passed on to the Unknown Word Lemmatizer for further processing. The Unknown Word Lemmatizer analyses the word's structure and associates one or more word formation rules with the corresponding base forms in the lexicon. The output is the base form of a word and its category. As a result, a word such as Umsatzwarnungen is analyzed successfully, even though the base form Umsatzwarnung is not listed in the dictionary. query -> umsatzwarnungen result -> umsatzwarnung (Cat N) WMTrans Inflection Analyzer The Inflection Analyzer determines the base form and category of a word, as well as providing additional grammatical and orthographical information. WMTrans Recognizer The Recognizer detects if a character string is a valid German word. WMTrans Generator The Generator returns all inflected word forms and spelling variants for a base form. WMTrans Inflection Analyzer/Generator The Inflection Analyzer/Generator determines the base form and category of a word and computes all possible inflected forms and spelling variants for a given base form WMTrans Word Formation Analyzer/Generator The Word Formation Analyzer/Generator determines the components of a word from which it has been derived or composed and finds all possible word composites and derivations in which a given word is involved. Benefits of WMTrans Products Canoo's language tools offer the following unique benefits: Effective use of technology: WMTrans products are finite state machines, which are highly efficient in memory consumption and processing speed. Excellent dictionary quality: the dictionary has been hand-compiled by a team of highly qualified linguists, using a dedicated authoring environment, which offers superior support during data entry and ensures a high data consistency. Complete set of word formation rules: this comprehensive dictionary knowledge is used, for example, by the Unknown Word Lemmatizer to provide accurate analyses of non-lexicalized entries. Platforms WMTrans products are available for several platforms: Platform (API) Product Windows, Linux, Solaris (Java) WMTrans Lemmatizer WMTrans Unknown Word Lemmatizer Linux (Java, C++) WMTrans Lemmatizer WMTrans Inflection Analyzer WMTrans Recognizer WMTrans Generator WMTrans Inflection Analyzer/Generator WMTrans Word Formation Analyzer/Generator Download Trial Versions Download free evaluation licenses at: http://www.canoo.com/wmtran s/download Browse through the product descriptions, test the APIs and find out how the WMTrans shared libraries can be integrated into your application. Canoo Online Services are based on WMTrans products and provide examples of possible applications. These services are available at: http://www.canoo.net About Canoo Founded in 1999, Canoo (http://www.canoo.com) delivers custom-made software solutions for business applications on the Internet, Intranet, and Extranet. Canoo is based in Basel, Switzerland. Contact: Elisabeth Maier Canoo Engineering AG Kirschgartenstr. 7 CH-4051 Basel mailto:wmtrans-info_(on)_canoo.com Phone: +41 (61) 228 94 44 Voucher copy requested Canoo Engineering AG Kirschgartenstrasse 7 CH-4051 Basel Tel. +41 61 228 94 66 Fax +41 61 228 94 49 mailto:sandra.wendland_(on)_canoo.com Web: http://www.canoo.com

[print/pda] [no frame] [navigation table] [navigation frame]     Page generated 14-02-2008 by Steven Krauwer Disclaimer / Contact ELSNET