The effects of data pretreatment, data scaling, and variable selection on three-dimensional quantitative structure-activity relationships derived by comparative molecular field analysis (CoMFA) using the GRID energy function were studied in detail for a set of inhibitors of the human synovial fluid phospholipase A2 (HSF-PLA2). The quality of the models was evaluated for predictive power and ability to map the receptor binding site by (a) comparison of predicted and experimental activities using cross-validation and external validation sets and (b) comparison of the regions selected in space in the CoMFA models with a crystal structure of a HSF-PLA2-inhibitor complex, with optimized comparative binding energy analysis (COMBINE) models (Ortiz et al., 1995) and with structure-activity relationships derived previously for different sets of compounds. It is found that (1) data scaling and dielectric modeling strongly influence CoMFA results. Unscaled data and a uniform dielectric constant of 4 are well suited to GRID-CoMFA studies for the present compound set. (2) The GOLPE and Q2-GRS variable selection methods select variables in roughly the same regions in Cartesian space, but they produce different models in chemometric space and differ in their sensitivity to data scaling and pretreatment and their tendency to overfitting. (3) CoMFA models are consistent with COMBINE models in that they identify approximately the same intermolecular interactions as relevant for activity. Our study provides support for the qualitative receptor-mapping properties of CoMFA models and for the validity of variable selection when applied with care and also provides guidelines for how to evaluate the quality of CoMFA models.
Reliability of Comparative Molecular Field Analysis Models: Effects of Data Scaling and Variable Selection Using a Set of Human Synovial Fluid Phospholipase A2 Inhibitors
CRUCIANI, Gabriele;
1997
Abstract
The effects of data pretreatment, data scaling, and variable selection on three-dimensional quantitative structure-activity relationships derived by comparative molecular field analysis (CoMFA) using the GRID energy function were studied in detail for a set of inhibitors of the human synovial fluid phospholipase A2 (HSF-PLA2). The quality of the models was evaluated for predictive power and ability to map the receptor binding site by (a) comparison of predicted and experimental activities using cross-validation and external validation sets and (b) comparison of the regions selected in space in the CoMFA models with a crystal structure of a HSF-PLA2-inhibitor complex, with optimized comparative binding energy analysis (COMBINE) models (Ortiz et al., 1995) and with structure-activity relationships derived previously for different sets of compounds. It is found that (1) data scaling and dielectric modeling strongly influence CoMFA results. Unscaled data and a uniform dielectric constant of 4 are well suited to GRID-CoMFA studies for the present compound set. (2) The GOLPE and Q2-GRS variable selection methods select variables in roughly the same regions in Cartesian space, but they produce different models in chemometric space and differ in their sensitivity to data scaling and pretreatment and their tendency to overfitting. (3) CoMFA models are consistent with COMBINE models in that they identify approximately the same intermolecular interactions as relevant for activity. Our study provides support for the qualitative receptor-mapping properties of CoMFA models and for the validity of variable selection when applied with care and also provides guidelines for how to evaluate the quality of CoMFA models.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.