In survey sampling, a random sample is drawn from a finite population in order to perform inference on descriptive characteristics of some variables of interest. Usually, nonresponse occurs and, as a consequence, the variables of interest are not observed for the entire selected sample by causing missing data. We distinguish between two types of missing data: unit nonresponse, when a selected sample unit is not observed at all – reasons for this may be that the unit is not found at home, or he/she is not in the condition of providing the information required because ill or not informed, or simply because he/she refuses to collaborate – and item nonresponse, when an interviewed unit does not respond to all of the questions in the questionnaire. Addressing the issue of nonresponse is very important, since nonresponse is present in almost all surveys and, above all, can highly bias estimates if the responding units are systematically different from the non responding ones. Several techniques have been proposed in the literature to deal with nonresponse at the estimation stage. Typically, unit and item nonresponse are treated separately: unit nonresponse adjustments use methods based on response modeling or on calibration (see e.g. Cassel et al., 1983; Kim and Kim, 2007; Särndal and Lundström, 2005), while item nonresponse is usually addressed via imputation (single or multiple, see e.g. Rubin, 1987). Usually, unit nonresponse is treated in a two-phase framework, in which the selected sample is the first phase sample, while the set of respondents is considered as a second phase sample with unknown probabilities of inclusion. The latter are unknown individual characteristics defined for all units in the population and measure the probability that a unit responds given that it was included in the sample. When auxiliary information is available for all units in the original sample, these probabilities can be estimated. A common approach is to use a logistic model for the response indicator (see e.g. Kim and Kim, 2007). Note that the response probability is a measure of the propensity of a unit to participate in the survey and that, therefore, it can also be considered as a latent variable. The use of latent variable models with covariates was proposed by Moustaki and Knott (2000) for weighting in the presence of item non-response. In this paper, we take a different perspective and use latent variable models to address non-ignorable unit nonresponse also when auxiliary information is not available. Non-ignorable non-response is typical of surveys with sensitive questions (concerning drug abuse, sexual attitudes, politics, income, etc). The proposed method develops weights for the respondents by first linking unit non-response to item non-response via a continuous latent variable. This latent variable will be then used as a covariate for response probability estimation. Following Moustaki and Knott (2000), ‘weighting through latent variable modelling is expected to perform well under non-ignorable nonresponse where conditioning on observed covariates only is not enough.’ Moreover, in the absence of any covariate, we expect that an estimator based on the proposed weighting system will perform better in reducing bias than the naive estimator computed without this adjustment. The paper is organized as follows. After a short introduction to latent variable models, the proposed methodology is illustrated. The properties of the proposed estimators are sketched and some results from a simulation study are presented, together with some concluding remarks.
Adjusting for nonignorable nonresponse using a latent variable modeling approach
RANALLI, Maria Giovanna
2011
Abstract
In survey sampling, a random sample is drawn from a finite population in order to perform inference on descriptive characteristics of some variables of interest. Usually, nonresponse occurs and, as a consequence, the variables of interest are not observed for the entire selected sample by causing missing data. We distinguish between two types of missing data: unit nonresponse, when a selected sample unit is not observed at all – reasons for this may be that the unit is not found at home, or he/she is not in the condition of providing the information required because ill or not informed, or simply because he/she refuses to collaborate – and item nonresponse, when an interviewed unit does not respond to all of the questions in the questionnaire. Addressing the issue of nonresponse is very important, since nonresponse is present in almost all surveys and, above all, can highly bias estimates if the responding units are systematically different from the non responding ones. Several techniques have been proposed in the literature to deal with nonresponse at the estimation stage. Typically, unit and item nonresponse are treated separately: unit nonresponse adjustments use methods based on response modeling or on calibration (see e.g. Cassel et al., 1983; Kim and Kim, 2007; Särndal and Lundström, 2005), while item nonresponse is usually addressed via imputation (single or multiple, see e.g. Rubin, 1987). Usually, unit nonresponse is treated in a two-phase framework, in which the selected sample is the first phase sample, while the set of respondents is considered as a second phase sample with unknown probabilities of inclusion. The latter are unknown individual characteristics defined for all units in the population and measure the probability that a unit responds given that it was included in the sample. When auxiliary information is available for all units in the original sample, these probabilities can be estimated. A common approach is to use a logistic model for the response indicator (see e.g. Kim and Kim, 2007). Note that the response probability is a measure of the propensity of a unit to participate in the survey and that, therefore, it can also be considered as a latent variable. The use of latent variable models with covariates was proposed by Moustaki and Knott (2000) for weighting in the presence of item non-response. In this paper, we take a different perspective and use latent variable models to address non-ignorable unit nonresponse also when auxiliary information is not available. Non-ignorable non-response is typical of surveys with sensitive questions (concerning drug abuse, sexual attitudes, politics, income, etc). The proposed method develops weights for the respondents by first linking unit non-response to item non-response via a continuous latent variable. This latent variable will be then used as a covariate for response probability estimation. Following Moustaki and Knott (2000), ‘weighting through latent variable modelling is expected to perform well under non-ignorable nonresponse where conditioning on observed covariates only is not enough.’ Moreover, in the absence of any covariate, we expect that an estimator based on the proposed weighting system will perform better in reducing bias than the naive estimator computed without this adjustment. The paper is organized as follows. After a short introduction to latent variable models, the proposed methodology is illustrated. The properties of the proposed estimators are sketched and some results from a simulation study are presented, together with some concluding remarks.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.