Background Each year numerous studies evaluate longitudinal data within a lifecourse context with later-life health status (e.g. blood pressure) analysed with respect to repeated measures of early-life experiences (e.g. body mass) using standard multiple linear/logistic regression. Although more sophisticated methods are available, some have been shown to be problematic, hence there remains confusion around which is the most appropriate analytical strategy. Standard multiple regression nevertheless suffers text-book errors in this lifecourse context that are sadly perpetuated despite previous warnings. We revisit these problems with a simulation study to give clear guidance on what happens if basic medical statistics dos and don’ts are ignored.
Methods We simulated a lifecourse dataset comprising repeated measures of z-score body mass at regular intervals following birth using a multivariate normal (9 outcomes) with correlation between birth weight (BW) and adult weight (AW) of 0.1, and adjacent intermediate outcome correlations derived as (0.1)^1/8. We simulated an adult z-score systolic blood pressure (SBP) with correlations between SBP and BW of -0.1, between SBP and AW of 0.2, and correlations between SBP and each intervening body mass measure extrapolated linearly between -0.1 and 0.2. We conducted a series of basic regression analyses, akin to those frequently seen in lifecourse research. Each simulation contained 5000 subjects (assumed to be of one sex) and was repeated 10,000 times to obtain 95% empirical credible intervals.
Results As previously identified, analyses with BW as the exposure saw point estimates move from the null when mediators were inappropriately included, yielding exaggerated interpretation of the impact of BW, and this was persistent irrespective of which mediators were chosen. Analyses with AW as the exposure saw attenuation of the AW effects varying according to which prior body mass measures were included as confounders: early-life confounders attenuated from the null whilst later-life confounders attenuated towards the null. This indicates some confusion around the extent and correct interpretation of confounding. Whilst large standard errors potentially mask these anomalies, they are manifest more for mediators or confounders more distal to the exposure.
Conclusion The common practice to analyse a multitude of longitudinal measures within a lifecourse context in a multiple regression model can lead to confused estimates for the main exposure. More careful consideration of the use of multiple regression is required, with distinction between genuine confounders and mediators becoming more widely understood.