Article Text

Download PDFPDF

OP112 Use of outcome ‘change-scores’ in observational data are a potential source of inferential bias
  1. PWG Tennant1,2,3,
  2. KF Arnold1,2,
  3. GTH Ellison1,2,
  4. J Textor4,
  5. SC Gadd1,5,
  6. L Berrie1,2,
  7. J Ellis2,
  8. MS Gilthorpe1,2,3
  1. 1Leeds Institute for Data Analytics, University of Leeds, Leeds, UK
  2. 2School of Medicine, University of Leeds, Leeds, UK
  3. 3The Alan Turing Institute, London, UK
  4. 4Department of Tumour Immunology, Radboud University Medical Center, Nijmegen, The Netherlands
  5. 5School of Geography, University of Leeds, Leeds, UK


Background Studies of change are a cornerstone of research in the health sciences. Robust analyses of change are however extremely challenging, especially in observational data.

In simple exposure-outcome scenarios, one common approach is to create and analyse an outcome ‘change-score’ by subtracting the baseline outcome from follow-up outcome. Tens-of-thousands of articles can be found that have adopted this approach. Unfortunately, this approach fails to capture the (desired) modifiable component of the outcome variable that occurred after baseline. On the contrary, it retains sign-reversed information from the baseline outcome that can create extremely-misleading associations.

Using directed acyclic graphs (DAGs) and illustrative simulations, this study explains why outcome change-scores do not capture the true causal quantity of interest and demonstrates the extent of disagreement between robust analyses and change-score analyses in various circumstances.

Methods DAGs with deterministic nodes are used to explain why change-scores do not capture the (desired) modifiable component of the outcome that occurs after baseline. The implications are then illustrated in simulated data, by analysing outcome change-scores with respect to a baseline exposure under several causal scenarios.

Data were simulated using DAGitty R 0.2–2 to match three broad scenarios, with the baseline outcome as 1) competing exposure, 2) confounder, and 3) mediator for the total causal effect of the exposure on the follow-up outcome. Means, standard deviations, and distributions were informed by data from the US National Health and Nutrition Examination Survey for 2009–2014. The association between the baseline exposure and outcome change-score was estimated by linear regression; and the coefficients compared to the known truth and coefficients obtained from robust analyses.

Results Naïve regression analyses of the outcome change-score (insulin) with respect to the baseline exposure (waist circumference) produced biased causal inferences in all scenarios except where the exposure and outcome were uncorrelated at baseline (as in a randomised experiment). When the baseline outcome (insulin) confounded the effect of the baseline exposure (waist circumference) on the follow-up outcome, the naïve regression estimate remained confounded. When the baseline outcome (insulin) mediated the effect of the baseline exposure (waist circumference) on the follow-up outcome, the naïve regression estimate had the opposite sign to the total causal effect.

Conclusion Analyses ofchange-scores should be avoided in observational health research, as they can produce extremely misleading coefficients. Previous observational studies that have naively analysed and interpreted change-score variables should be viewed with extreme caution and any recommendations revisited.

  • Causal inference
  • analyses of change
  • observational data

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.