rss
  1. Good fit, but still the wrong model? Understanding data generation matters more than likelihood-based model-fit statistics.

    We read with interest the article describing methods for modelling count data with excess zeros compared to standard count distributions, such as Poisson1. This topic has been extensively discussed in the statistical and epidemiological literature2-3. Didactic messages given by statisticians can often lack an appreciation of the epidemiological context and sadly this article has the same shortcomings. The primary novelty is the context in which the issue is discussed: application to counts of Activities of Daily Living (ADL-s). It is laudable that methodological issues are tackled using practical examples, but our concern is that the novel context has been largely overlooked and an educational opportunity lost. The message is too narrow in scope and arguably misleading. The reader is told that the zero-inflated Negative Binomial is the most appropriate model for the ADL-s data, yet this is likely incorrect and could lead to inappropriate models being adopted in this domain. The Binomial distribution deals with bounded data and is extended to yield the Beta Binomial, which accommodates overdispersion. Both the Binomial and Beta Binomial distributions have zero-inflated extensions (e.g. the "zero-inflated Binomial") to account for an excess of zeros4. We were surprised to see no mention of these alternative distributions since the Poisson distribution exhibits an infinite tail and predicts counts in excess of six, albeit with low probabilities. Allowing for over-dispersion, the Negative Binomial accommodates a longer tail for the same mean. Both distributions (whether zero-inflated or not) thus possess properties that are incongruent with the nature of the data in this context. Any count distribution assumes an increment of one has the same meaning from one to two as from five to six; for zero-inflated extensions, the increment from zero to one may have a different meaning. As each ADL-s is unique, each is likely to have a different meaning. Hence, as individuals deteriorate in their condition, some activities may become a challenge before others. Consequently, an increment of one ADL may have different meaning along the scale and it is sensible to assume an ordinal outcome5, which also accommodates differences in the increment from zero to one. Good agreement between observed and predicted outcomes is necessary but not sufficient. Disparity between models with regard to predicted outcomes can often be negligible whilst models differ substantially in parameterisation and hence interpretation. Likelihood-based model-fit criteria are only one facet of model development; context validity and interpretability must also have a bearing and researchers must appreciate the context in which data are generated. For this reason the best model may not yet have been found. We have not investigated the dataset, nor do we feel the need to do so when proposing the ordinal model and only with no compelling evidence that increments differed might we instead propose the zero-inflated Binomial or Beta Binomial for parsimony. References 1. Zaninotto P, Falaschetti E. Comparison of methods for modelling a count outcome with excess zeros: application to Activities of Daily Living (ADL-s). J Epidemiol Community Health 2011; 65: 205-210. 2. Lambert D. Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 1992; 34:1–14. 3. Gilthorpe MS, Frydenberg M, Cheng Y, Baelum V. Modelling count data with excessive zeros: The need for class prediction in zero-inflated models and the issue of data generation in choosing between zero-inflated and generic mixture models for dental caries data. Statistics in Medicine 2009; 28: 3539-3553. 4. Vieira AMC, Hinde JP, Demetrio CGB. Zero-inflated proportion data models applied to a biological control assay. Journal of Applied Statistics 2000; 27:373–389. 5. Lall R, Campbell MJ, Walters SJ, Morgan K and MRC CFAS Co-operative. A review of ordinal regression models applied on health-related quality of life assessments. Stat Methods Med Res 2002; 11: 49-67.
    Submit response
« Parent article

Free sample

This recent issue is free to all users to allow everyone the opportunity to see the full scope and typical content of JECH.
View free sample issue >>

Don't forget to sign up for content alerts so you keep up to date with all the articles as they are published.