This study describes and evaluates a multi-method approach for identifying and extracting collocations to develop a learner Italian collocation dictionary. The approach integrates part-of-speech tagging and dependency parsing to extract six syntactic relations from a reference corpus of Italian. The initial set of candidates was gradually reduced using frequency, dispersion, and association measures. This set was then evaluated by comparing it with existing collocation dictionaries and gathering expert judgments on which collocations should be included. Combining these two evaluations, further refined the list. Moreover, the effect of statistical measures on expert judgments was investigated. Results revealed that dispersion and association measures positively influenced human evaluations, while higher frequency often correlated with negative ratings. This triangulation of corpus-based and statistical methods, human judgements and comparison with existing dictionaries captures collocations widely used across genres, suitable for inclusion in a learner dictionary, offering a useful tool for learners while contributing to corpusbased collocation research.

Developing a learner dictionary of collocations: description and evaluation of a multi-method approach

Perri, Damiano;Gervasi, Osvaldo
2026

Abstract

This study describes and evaluates a multi-method approach for identifying and extracting collocations to develop a learner Italian collocation dictionary. The approach integrates part-of-speech tagging and dependency parsing to extract six syntactic relations from a reference corpus of Italian. The initial set of candidates was gradually reduced using frequency, dispersion, and association measures. This set was then evaluated by comparing it with existing collocation dictionaries and gathering expert judgments on which collocations should be included. Combining these two evaluations, further refined the list. Moreover, the effect of statistical measures on expert judgments was investigated. Results revealed that dispersion and association measures positively influenced human evaluations, while higher frequency often correlated with negative ratings. This triangulation of corpus-based and statistical methods, human judgements and comparison with existing dictionaries captures collocations widely used across genres, suitable for inclusion in a learner dictionary, offering a useful tool for learners while contributing to corpusbased collocation research.
2026
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11391/1610494
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact