Background Most epidemiological studies have missing information, leading to reduced power and potential bias. Exposure-outcome associations will generally be biassed if the outcome variable is missing not at random (MNAR). Linkage to administrative data containing a proxy for the outcome allows assessment of MNAR. We used data from the Avon Longitudinal Study of Parents and Children (ALSPAC) and simulations to examine bias in the association between infant breastfeeding and IQ at 15 years, using linked school attainment data as a proxy for IQ.
Methods ALSPAC: Subjects were those who enrolled in 1990–91 and were alive at one year (n = 13,795), of whom 36% had IQ measured at 15. For those with missing IQ, 79% had data on attainment at age 16 obtained through linkage to the National Pupil Database. Breastfeeding information was collected via questionnaire at 1, 6 and 15 months. A number of potential confounders/factors predictive of non-response were collected during pregnancy. We estimated the association between duration of breastfeeding and IQ using a complete case analysis, multiple imputation (MI), and MI including linked attainment data.
Simulations: In the simulations we changed the strength of association between the outcome and the linked proxy, the proportion of missing data, and the extent to which the outcome was MNAR.
Results IQ measured at 15 in ALSPAC was MNAR – individuals with higher attainment were less likely to have missing IQ, even after adjusting for socio-demographic factors. The correlation between IQ and the main attainment variable was 0.59. Both complete case analysis and MI underestimated the association between breastfeeding and IQ compared to MI informed by linkage (mean difference in IQ comparing those breastfed for at least 6 months to those breastfed for less than one month was 4.2 (95% CI 3.4,5.0) using MI informed by linkage but 3.5 (2.5,4.4) in the complete case analysis). In simulations, including the linked proxy reduced bias and increased precision in all cases, although improvements were small when the correlation between the outcome and its proxy was low (.5).
Conclusion Linkage to administrative data containing a proxy for the outcome variable allows the MNAR assumption to be tested and more efficient analyses to be performed. Key limiting factors are the strength of association between the outcome and its proxy and coverage of the linked data; in our case, where the correlation was modest and linked data were not available for all individuals, some bias may remain.
- missing data
- data linkage
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.