Some of the practical aspects of long-term calibration-set building are presented in this study. A calibration model able to predict the Kolbach index for brewing malt is defined, and four different validations and resampling schemes were applied to determine its real predictive power. The results obtained demonstrated that one single performance criterion might be not sufficient and can lead to over- or underestimation of the model quality. Comparing a simple leave-one-sample-out cross-validation (CV) with two more challenging CVs with leave-N-samples-out, where the resamplings were repeated 200 times, it is demonstrated that the error of prediction value has an uncertainty, and these values change according to the type and the number of validation samples. Then, two kinds of test-set validations were applied, using data blocks based on the sample collection’s year, demonstrating that it is necessary to consider long-term effects on NIR calibrations and to be conservative in the number of factors selected. The conclusion is that one should be modest in reporting the prediction error because it changes according to the type of validation used to estimate it and it is necessary to consider the long-term effects.

Internal and External Validation Strategies for the Evaluation of Long-Term Effects in NIR Calibration Models

SILEONI, VALERIA;MARCONI, Ombretta;PERRETTI, Giuseppe Italo Francesco;FANTOZZI, Paolo
2011

Abstract

Some of the practical aspects of long-term calibration-set building are presented in this study. A calibration model able to predict the Kolbach index for brewing malt is defined, and four different validations and resampling schemes were applied to determine its real predictive power. The results obtained demonstrated that one single performance criterion might be not sufficient and can lead to over- or underestimation of the model quality. Comparing a simple leave-one-sample-out cross-validation (CV) with two more challenging CVs with leave-N-samples-out, where the resamplings were repeated 200 times, it is demonstrated that the error of prediction value has an uncertainty, and these values change according to the type and the number of validation samples. Then, two kinds of test-set validations were applied, using data blocks based on the sample collection’s year, demonstrating that it is necessary to consider long-term effects on NIR calibrations and to be conservative in the number of factors selected. The conclusion is that one should be modest in reporting the prediction error because it changes according to the type of validation used to estimate it and it is necessary to consider the long-term effects.
2011
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11391/225089
Citazioni
  • ???jsp.display-item.citation.pmc??? 2
  • Scopus 20
  • ???jsp.display-item.citation.isi??? 18
social impact