Tags

Type your tag names separated by a space and hit enter

Review: a gentle introduction to imputation of missing values.
J Clin Epidemiol. 2006 Oct; 59(10):1087-91.JC

Abstract

In most situations, simple techniques for handling missing data (such as complete case analysis, overall mean imputation, and the missing-indicator method) produce biased results, whereas imputation techniques yield valid results without complicating the analysis once the imputations are carried out. Imputation techniques are based on the idea that any subject in a study sample can be replaced by a new randomly chosen subject from the same source population. Imputation of missing data on a variable is replacing that missing by a value that is drawn from an estimate of the distribution of this variable. In single imputation, only one estimate is used. In multiple imputation, various estimates are used, reflecting the uncertainty in the estimation of this distribution. Under the general conditions of so-called missing at random and missing completely at random, both single and multiple imputations result in unbiased estimates of study associations. But single imputation results in too small estimated standard errors, whereas multiple imputation results in correctly estimated standard errors and confidence intervals. In this article we explain why all this is the case, and use a simple simulation study to demonstrate our explanations. We also explain and illustrate why two frequently used methods to handle missing data, i.e., overall mean imputation and the missing-indicator method, almost always result in biased estimates.

Authors+Show Affiliations

Center for Biostatistics, Utrecht University, Utrecht, The Netherlands. R.Donders@geo.uu.nlNo affiliation info availableNo affiliation info availableNo affiliation info available

Pub Type(s)

Journal Article
Research Support, Non-U.S. Gov't
Review

Language

eng

PubMed ID

16980149

Citation

Donders, A Rogier T., et al. "Review: a Gentle Introduction to Imputation of Missing Values." Journal of Clinical Epidemiology, vol. 59, no. 10, 2006, pp. 1087-91.
Donders AR, van der Heijden GJ, Stijnen T, et al. Review: a gentle introduction to imputation of missing values. J Clin Epidemiol. 2006;59(10):1087-91.
Donders, A. R., van der Heijden, G. J., Stijnen, T., & Moons, K. G. (2006). Review: a gentle introduction to imputation of missing values. Journal of Clinical Epidemiology, 59(10), 1087-91.
Donders AR, et al. Review: a Gentle Introduction to Imputation of Missing Values. J Clin Epidemiol. 2006;59(10):1087-91. PubMed PMID: 16980149.
* Article titles in AMA citation format should be in sentence-case
TY - JOUR T1 - Review: a gentle introduction to imputation of missing values. AU - Donders,A Rogier T, AU - van der Heijden,Geert J M G, AU - Stijnen,Theo, AU - Moons,Karel G M, Y1 - 2006/07/11/ PY - 2004/05/12/received PY - 2006/01/05/revised PY - 2006/01/10/accepted PY - 2006/9/19/pubmed PY - 2006/11/9/medline PY - 2006/9/19/entrez SP - 1087 EP - 91 JF - Journal of clinical epidemiology JO - J Clin Epidemiol VL - 59 IS - 10 N2 - In most situations, simple techniques for handling missing data (such as complete case analysis, overall mean imputation, and the missing-indicator method) produce biased results, whereas imputation techniques yield valid results without complicating the analysis once the imputations are carried out. Imputation techniques are based on the idea that any subject in a study sample can be replaced by a new randomly chosen subject from the same source population. Imputation of missing data on a variable is replacing that missing by a value that is drawn from an estimate of the distribution of this variable. In single imputation, only one estimate is used. In multiple imputation, various estimates are used, reflecting the uncertainty in the estimation of this distribution. Under the general conditions of so-called missing at random and missing completely at random, both single and multiple imputations result in unbiased estimates of study associations. But single imputation results in too small estimated standard errors, whereas multiple imputation results in correctly estimated standard errors and confidence intervals. In this article we explain why all this is the case, and use a simple simulation study to demonstrate our explanations. We also explain and illustrate why two frequently used methods to handle missing data, i.e., overall mean imputation and the missing-indicator method, almost always result in biased estimates. SN - 0895-4356 UR - https://www.unboundmedicine.com/medline/citation/16980149/Review:_a_gentle_introduction_to_imputation_of_missing_values_ DB - PRIME DP - Unbound Medicine ER -