When analysing the determinants of bankruptcy of small and medium enterprises, one of the most common problems is that of unbalanced data, as very often the event under study happens in only a small percentage of cases. The aim of this paper is to explore three different statistical methods of coping with unbalanced data and to identify which of these has the greatest predictive capability in the context of the bankrupcty event. The dataset is composed of all firms which were active in Tuscany in 2006. For each of them we have a five-year series of balance sheet indicators. Bankruptcy is represented by their legal status at May 2010. We focused on some indicators previously identified as predictors of the state of bankruptcy (Pierri 2013; Pierri, Burchi and Stanghellini 2013) and we tested the same model using the following three methods: logistic regression for matched case-control studies, logistic regression for a random balanced data sample, logistic regression for a sample balanced by ROSE (Random OverSampling Examples, Menardi and Torelli 2014). We built a training sample to develop the models and a hold-out sample to compare their discriminatory ability through ROC curves.

Forecasting Probability of Bankruptcy from unbalanced data

PIERRI, Francesca;STANGHELLINI, Elena;
2015

Abstract

When analysing the determinants of bankruptcy of small and medium enterprises, one of the most common problems is that of unbalanced data, as very often the event under study happens in only a small percentage of cases. The aim of this paper is to explore three different statistical methods of coping with unbalanced data and to identify which of these has the greatest predictive capability in the context of the bankrupcty event. The dataset is composed of all firms which were active in Tuscany in 2006. For each of them we have a five-year series of balance sheet indicators. Bankruptcy is represented by their legal status at May 2010. We focused on some indicators previously identified as predictors of the state of bankruptcy (Pierri 2013; Pierri, Burchi and Stanghellini 2013) and we tested the same model using the following three methods: logistic regression for matched case-control studies, logistic regression for a random balanced data sample, logistic regression for a sample balanced by ROSE (Random OverSampling Examples, Menardi and Torelli 2014). We built a training sample to develop the models and a hold-out sample to compare their discriminatory ability through ROC curves.
2015
9788498444964
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11391/1355471
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact