Dimension reduction methods play an important role in multivariate statistical analysis, in particular with high-dimensional data. Linear methods can be seen as a linear mapping from the original feature space to a dimension reduction subspace. The aim is to transform the data so that the essential structure is more easily understood. However, highly correlated variables provide redundant information, whereas some other feature may be irrelevant, and we would like to identify and then discard both of them while pursuing dimension reduction. Here we propose a greedy search algorithm, which avoids the search over all possible subsets, for ranking subsets of variables based on their ability to explain variation in the dimension reduction variates.
Subset selection in dimension reduction methods
SCRUCCA, Luca
2006
Abstract
Dimension reduction methods play an important role in multivariate statistical analysis, in particular with high-dimensional data. Linear methods can be seen as a linear mapping from the original feature space to a dimension reduction subspace. The aim is to transform the data so that the essential structure is more easily understood. However, highly correlated variables provide redundant information, whereas some other feature may be irrelevant, and we would like to identify and then discard both of them while pursuing dimension reduction. Here we propose a greedy search algorithm, which avoids the search over all possible subsets, for ranking subsets of variables based on their ability to explain variation in the dimension reduction variates.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.