Analysing censored data in agricultural research: A review with examples and software tips

Onofri, Andrea; Piepho, Hans-Peter; Kozak, Marcin

doi:10.1111/aab.12477

Metric data are usually assessed on a continuous scale with good precision, but sometimes agricultural researchers cannot obtain precise measurements of a variable. Values of such a variable cannot then be expressed as real numbers (e.g., 1.51 or 2.56), but often can be represented by intervals into which the values fall (e.g., from 1 to 2 or from 2 to 3). In this situation, statisticians talk about censoring and censored data, as opposed to missing data, where no information is available at all. Traditionally, in agriculture and biology, three methods have been used to analyse such data: (a) when intervals are narrow, some form of imputation (e.g., mid-point imputation) is used to replace the interval and traditional methods for continuous data are employed (such as analyses of variance [ANOVA] and regression); (b) for time-to-event data, the cumulative proportions of individuals that experienced the event of interest are analysed, instead of the individual observed times-to-event; (c) when intervals are wide and many individuals are collected, non-parametric methods of data analysis are favoured, where counts are considered instead of the individual observed value for each sample element. In this paper, we show that these methods may be suboptimal: The first one does not respect the process of data collection, the second leads to unreliable standard errors (SEs), while the third does not make full use of all the available information. As an alternative, methods of survival analysis for censored data can be useful, leading to reliable inferences and sound hypotheses testing. These methods are illustrated using three examples from plant and crop sciences.

IRIS - Res&Arch Institutional Research Information System - Research & Archive