Article Text
Abstract
Background Many epidemiological studies are interested in examining whether change in a variable from its baseline (Y0) to follow-up (Y1; hence ΔY = Y1 − Y0) can be predicted by (and is therefore potentially caused by) a second variable also measured at baseline (X0). Although these analyses appear straightforward, they may incur a statistical artefact as a result of what is called indirect mathematical coupling (MC). This problem is not widely recognised, and many spurious inferences might thus be found in the epidemiological literature. The aim of this study was to demonstrate how MC occurs in order to avoid such occurrences in future research.
Methods We consider four scenarios in which there is no true causal relationship between a baseline exposure (X0) and the outcome at follow-up (Y1), where: (i) there is no relationship (causal or correlational) between the baseline exposure (X0) and the outcome at baseline (Y0); (ii) the baseline exposure (X0) is a cause of the baseline outcome (Y0); (iii) the baseline outcome (Y0) is a cause of the baseline exposure (X0); and (iv) there is no causal relationship between the baseline exposure (X0) and the baseline outcome (Y0), but these two variables are correlated due to another (unmeasured) factor that is part causal of both. To demonstrate how MC might affect the analysis of change (ΔY = Y1 − Y0) for each scenario, we chose insulin level as the outcome (Y) and body mass index (BMI) as the exposure (X), and simulate data at baseline and one year follow-up for 1,000 individuals using population values of blood insulin and BMI for healthy adolescents obtained from the literature.
Results Under scenario (i) the regression model correctly estimates the relationship between BMI at baseline (X0) and change in blood insulin (ΔY = Y1 − Y0) as zero. However, under scenarios (ii)–(iv), where there is either a causal (i.e. (ii) or (iii)) or a correlational (i.e. (iv)) relationship between BMI at baseline (X0) and blood insulin at follow-up (Y1), the regression model incorrectly estimates the relationship between BMI at baseline (X0) and change in blood insulin (ΔY = Y1 − Y0) as non-zero. For modest sample sizes, the coefficients generated by these analyses would be statistically significant.
Conclusion Even when a baseline exposure has no true relationship with the change in an outcome variable (from baseline to follow-up), a spurious (and potentially statistically significant) relationship will be observed if there is a relationship (causal or correlational) between the exposure at baseline and the outcome at baseline.