Multiple imputation: review of theory, implementation and software

Ofer Harel; Xiao-Hua Zhou

doi:10.1002/sim.2787

Multiple imputation: review of theory, implementation and software

Stat Med. 2007 Jul 20;26(16):3057-77. doi: 10.1002/sim.2787.

Authors

Ofer Harel¹, Xiao-Hua Zhou

Affiliation

¹ Department of Statistics, University of Connecticut, 215 Glenbrook Road Unit 4120 Storrs, CT 06269-4120, USA. oharel@stat.uconn.edu

PMID: 17256804
DOI: 10.1002/sim.2787

Abstract

Missing data is a common complication in data analysis. In many medical settings missing data can cause difficulties in estimation, precision and inference. Multiple imputation (MI) (Multiple Imputation for Nonresponse in Surveys. Wiley: New York, 1987) is a simulation-based approach to deal with incomplete data. Although there are many different methods to deal with incomplete data, MI has become one of the leading methods. Since the late 1980s we observed a constant increase in the use and publication of MI-related research. This tutorial does not attempt to cover all the material concerning MI, but rather provides an overview and combines together the theory behind MI, the implementation of MI, and discusses increasing possibilities of the use of MI using commercial and free software. We illustrate some of the major points using an example from an Alzheimer disease (AD) study. In this AD study, while clinical data are available for all subjects, postmortem data are only available for the subset of those who died and underwent an autopsy. Analysis of incomplete data requires making unverifiable assumptions. These assumptions are discussed in detail in the text. Relevant S-Plus code is provided.

Publication types

Research Support, N.I.H., Extramural
Research Support, U.S. Gov't, P.H.S.

MeSH terms

Bias*
Diagnostic Tests, Routine / statistics & numerical data
Models, Statistical
Models, Theoretical*
Sensitivity and Specificity
Software*
United States

Abstract

Publication types

MeSH terms

Grants and funding