ELSNET-list archive

Category:   E-Announce
Subject:   Semantic Similarity Experiment
Email:   nseco_(on)_dei.uc.pt
Date received:   17 Jul 2007

[Apologies for cross-postings] [Please distribute to potentially interested parties] In the context of joint research project we are asking fellow researchers to contribute about 10 min of their time and collaborate in experiment that (we hope) will help us gather a large dataset of similarity ratings for pairs of words. Participation is quite simple, so if you are interested please read the section HOW TO PARTICIPATE. If you want to learn more about the experiment please read the section INTRODUCTION. Thanks in advance, Giuseppe Pirrò & Nuno Seco ---------------------------------------------------------------------------- Introduction: Semantic similarity plays an important role in Information Retrieval, Natural Language Processing, Ontology Mapping and other related fields of research. In particular researchers have developed a variety of semantic similarity and relatedness measures by exploiting information found in lexical resources such as WordNet. Current similarity metrics based on WordNet can be classified in one of the following categories: Edge-Counting measures that are based on the number of links relating two concepts that are being compared. Information Content measures that are based on the idea that the similarity of two concepts is related to the amount of information they have in common. Feature-Based measures that exploit the features (e.g., descriptions in natural language) of a term while usually ignoring their location in the taxonomy. Hybrid measures that combine ideas from previous categories. In order to evaluate the suitability of the various similarity measures they are usually compared against human judgements by calculating correlation values. A typical reference, in terms of evaluation, are the results of the Rubenstein and Goodenough (R&G) experiment. R&G in 1965 obtained "synonymy judgments" of 51 human subjects on 65 pairs of words. The pairs ranged from "highly synonymous" (gem-jewel) to "semantically unrelated" (noon-string). Subjects were asked to rate them on the scale of 0.0 to 4.0 according to their "similarity of meaning" and ignoring any other observed semantic relationships. Even if from the R&G experiment, other similar experiments have been carried out, we are not aware of similarity experiments aimed at showing how robust the different measures are when compared against different versions of WordNet. With this objective in mind we want to collect human similarity estimations on the whole Rubsteing and Goodenough dataset and subsequentially compare outputs of existing similarity measures. We chose to adopt the R&G dataset since others have worked on it, thus permitting direct comparison of results obtained by different experiments. Moreover, we want to show the suitability of an Information Content metric that solely relies on the WordNet taxonomy, without relying on external collection of texts. How to participate: In order to participate in the similarity experiment point your browser to: http://grid.deis.unical.it/similarity/ Then by clicking on the register link you can register and immediately receive a password via email. After logging in you should indicate similarity values for all the word pairs by using the Slider provided for each pair. The estimated time required is about 10 minutes including time for registering. Results of the experiment and the data will be published as soon as we collect a significant amount of ratings. _______________________________________________ Elsnet-list mailing list Elsnet-list_(on)_elsnet.org http://mailman.elsnet.org/mailman/listinfo/elsnet-list

[print/pda] [no frame] [navigation table] [navigation frame]     Page generated 14-02-2008 by Steven Krauwer Disclaimer / Contact ELSNET