Abstract. One of the most fascinating areas of study in the current economic and nancial world is the forecasting of credit risk and the ability to predict a company's insolvency. Meanwhile, one major challenge in constructing predictive failure models is variable selection. Standard selection methods exist alongside new approaches. In addition, the huge availability of data often implies limitations due to processing time and new high-performance procedures provide tools that can take advantage of parallel processing. In the present paper, dierent variable selection techniques were explored in the context of applying logistic regression for binary data to a balanced data set including only rms active or in bankruptcy. Models deriving from stepwise selection, the Least Absolute Shrinkage and Selection Operator (LASSO) and an unsupervised method, based on the maximum data variance explained, were compared. Then a non-parametric approach was considered and the selection of variables coming from a single decision tree and a forest of trees is compared and discussed.
Variable Selection in Binary Logistic Regression for Modelling Bankruptcy Risk
Pierri Francesca
2023
Abstract
Abstract. One of the most fascinating areas of study in the current economic and nancial world is the forecasting of credit risk and the ability to predict a company's insolvency. Meanwhile, one major challenge in constructing predictive failure models is variable selection. Standard selection methods exist alongside new approaches. In addition, the huge availability of data often implies limitations due to processing time and new high-performance procedures provide tools that can take advantage of parallel processing. In the present paper, dierent variable selection techniques were explored in the context of applying logistic regression for binary data to a balanced data set including only rms active or in bankruptcy. Models deriving from stepwise selection, the Least Absolute Shrinkage and Selection Operator (LASSO) and an unsupervised method, based on the maximum data variance explained, were compared. Then a non-parametric approach was considered and the selection of variables coming from a single decision tree and a forest of trees is compared and discussed.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.