Omics datasets, comprehensively characterizing biological samples at a molecular level, are continuously increasing in both complexity and dimensionality. In this scenario, there is a need for tools to improve data interpretability, expediting the process of extracting relevant biochemical information. Here we introduce the subspace discriminant index (SDI) for multi-component models, which points to the most promising components to explore pre-defined groups of observations, and can also be used to compare several modeling variants in terms of discriminative power. The SDI is especially useful during the initial exploration of a data set, in order to make informed decisions on, e.g., pre-processing or modeling variants for further analysis. The versatility and the efficiency of the proposed index is demonstrated in two real world omics case studies, including a highly complex multi-class problem. The code for the computation of the SDI is freely available in the Matlab MEDA toolbox and linked in the present manuscript. By boosting the interpretation capabilities, the SDI represents a significant addition to the chemometric toolbox.
Subspace discriminant index to expedite exploration of multi-class omics data
Tortorella S.
;Servili M.;Cruciani G.;
2020
Abstract
Omics datasets, comprehensively characterizing biological samples at a molecular level, are continuously increasing in both complexity and dimensionality. In this scenario, there is a need for tools to improve data interpretability, expediting the process of extracting relevant biochemical information. Here we introduce the subspace discriminant index (SDI) for multi-component models, which points to the most promising components to explore pre-defined groups of observations, and can also be used to compare several modeling variants in terms of discriminative power. The SDI is especially useful during the initial exploration of a data set, in order to make informed decisions on, e.g., pre-processing or modeling variants for further analysis. The versatility and the efficiency of the proposed index is demonstrated in two real world omics case studies, including a highly complex multi-class problem. The code for the computation of the SDI is freely available in the Matlab MEDA toolbox and linked in the present manuscript. By boosting the interpretation capabilities, the SDI represents a significant addition to the chemometric toolbox.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.