For a logistic regression model the log-odds depend on the log of the ratio of the conditional densities of the predictors given the response variable. This suggests that relevant statistical information could be extracted by investigating the inverse problem of the predictors given the response. For binary responses, assuming certain parametric distributions, it is possible to obtain which terms are needed, and how they should be included in a logistic regression model. In the one predictor case, and under the normality assumption, a known result shows that a linear and a quadratic term are needed in a logistic regression model, with the quadratic term not required if the two conditional distributions have the same variance. However, the quadratic component may not be needed if the linear term is sufficient to discriminate between the two groups, that is if the two conditional distributions are far enough apart. A simulation study is presented which shows that if the ratio of variances is between 2/3 and 1.5 the quadratic term is less likely to be useful; this also happens when the mean difference scaled by the variance ratio tends to be large. Graphically, if the conditional distributions of x|y for the two groups are well separated a linear term should contain all the relevant statistical information available in the data. On the contrary, if they overlap significantly, and the variances are clearly not equal, then the quadratic term is likely to be needed. Minor deviations from normality should not be worrisome, particularly outside the range in which the empirical distributions overlap.

A simulation study to investigate the behavior of the log-density ratio under normality

SCRUCCA, Luca;
2004

Abstract

For a logistic regression model the log-odds depend on the log of the ratio of the conditional densities of the predictors given the response variable. This suggests that relevant statistical information could be extracted by investigating the inverse problem of the predictors given the response. For binary responses, assuming certain parametric distributions, it is possible to obtain which terms are needed, and how they should be included in a logistic regression model. In the one predictor case, and under the normality assumption, a known result shows that a linear and a quadratic term are needed in a logistic regression model, with the quadratic term not required if the two conditional distributions have the same variance. However, the quadratic component may not be needed if the linear term is sufficient to discriminate between the two groups, that is if the two conditional distributions are far enough apart. A simulation study is presented which shows that if the ratio of variances is between 2/3 and 1.5 the quadratic term is less likely to be useful; this also happens when the mean difference scaled by the variance ratio tends to be large. Graphically, if the conditional distributions of x|y for the two groups are well separated a linear term should contain all the relevant statistical information available in the data. On the contrary, if they overlap significantly, and the variances are clearly not equal, then the quadratic term is likely to be needed. Minor deviations from normality should not be worrisome, particularly outside the range in which the empirical distributions overlap.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11391/153129
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 6
  • ???jsp.display-item.citation.isi??? 8
social impact