Finite mixture modelling provides a framework for cluster analysis based on parsimonious Gaussian mixture models. Variable or feature selection is of particular importance in situations where only a subset of the available variables provide clustering information. This enables the selection of a more parsimonious model, yielding more efficient estimates, a clearer interpretation and, often, improved clustering partitions. This paper describes the R package clustvarsel which performs subset selection for model-based clustering. An improved version of the citet{Raftery:Dean:2006} methodology is implemented in the new release of the package to find the (locally) optimal subset of variables with group/cluster information in a dataset. Search over the solution space is performed using either a stepwise greedy search or a headlong algorithm. Adjustments for speeding up these algorithms are discussed, as well as a parallel implementation of the stepwise search. Usage of the package is presented through the discussion of several data examples.

clustvarsel: A Package Implementing Variable Selection for Gaussian Model-Based Clustering in R

SCRUCCA, Luca
;
2018

Abstract

Finite mixture modelling provides a framework for cluster analysis based on parsimonious Gaussian mixture models. Variable or feature selection is of particular importance in situations where only a subset of the available variables provide clustering information. This enables the selection of a more parsimonious model, yielding more efficient estimates, a clearer interpretation and, often, improved clustering partitions. This paper describes the R package clustvarsel which performs subset selection for model-based clustering. An improved version of the citet{Raftery:Dean:2006} methodology is implemented in the new release of the package to find the (locally) optimal subset of variables with group/cluster information in a dataset. Search over the solution space is performed using either a stepwise greedy search or a headlong algorithm. Adjustments for speeding up these algorithms are discussed, as well as a parallel implementation of the stepwise search. Usage of the package is presented through the discussion of several data examples.
2018
File in questo prodotto:
File Dimensione Formato  
Scrucca Raftery 2018 clustvarsel A Package Implementing Variable Selection for Gaussian Model-Based Clustering in R.pdf

accesso aperto

Descrizione: Articolo principale
Tipologia di allegato: PDF-editoriale
Licenza: Creative commons
Dimensione 748.44 kB
Formato Adobe PDF
748.44 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11391/1403938
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 40
  • ???jsp.display-item.citation.isi??? 36
social impact