The Internet explosion and the massive diffusion of mobile devices lead to the creation of a worldwide collaborative system, daily used by millions of users through search engines and application interfaces. New paradigms permit to calculate the similarity of terms using only the statistical information returned by a query, or from additional features; also old algorithms and measures have been applied to new domains and scopes, to efficiently find words clusters from the Web. The problem of evaluating such techniques and algorithms in new domains emerges, and highlights a still open field of experimentation. In this paper, preliminary tests have been held on different semantic proximity measures (average confidence, NGD, PMI, chi(2), PMING Distance), and different clustering algorithms among the most used in literature have been compared (e.g. k-means, Expectation-Maximization, spectral clustering) for evaluating such measures. The suitability of the considered measures and methods to calculate the semantic proximity was verified at the state-of-art, and problems were identified, comparing the results of measurements to a ground truth provided by models of contextualized knowledge, clustering and human perception of semantic relations, which data are already studied in literature.

A semantic comparison of clustering algorithms for the evaluation of web-based similarity measures

FRANZONI, Valentina;MILANI, Alfredo
2016

Abstract

The Internet explosion and the massive diffusion of mobile devices lead to the creation of a worldwide collaborative system, daily used by millions of users through search engines and application interfaces. New paradigms permit to calculate the similarity of terms using only the statistical information returned by a query, or from additional features; also old algorithms and measures have been applied to new domains and scopes, to efficiently find words clusters from the Web. The problem of evaluating such techniques and algorithms in new domains emerges, and highlights a still open field of experimentation. In this paper, preliminary tests have been held on different semantic proximity measures (average confidence, NGD, PMI, chi(2), PMING Distance), and different clustering algorithms among the most used in literature have been compared (e.g. k-means, Expectation-Maximization, spectral clustering) for evaluating such measures. The suitability of the considered measures and methods to calculate the semantic proximity was verified at the state-of-art, and problems were identified, comparing the results of measurements to a ground truth provided by models of contextualized knowledge, clustering and human perception of semantic relations, which data are already studied in literature.
2016
9783319420912
9783319420912
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11391/1399027
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 18
  • ???jsp.display-item.citation.isi??? 14
social impact