Article Text

Download PDFPDF

OP78 A causal inference perspective on compositional data and collider ‘bias’
  1. KF Arnold1,2,
  2. L Berrie1,2,
  3. PWG Tennant1,2,3,
  4. MS Gilthorpe1,2,3
  1. 1Leeds Institute for Data Analytics, University of Leeds, Leeds, UK
  2. 2School of Medicine, University of Leeds, Leeds, UK
  3. 3The Alan Turing Institute, London, UK


Background Compositional data (CD) comprise the parts of some whole, for which all parts sum to that whole; the whole may vary across individual units of analysis or remain fixed. Such data are common in many contexts, where interest often lies in understanding the effect of a particular part in relation to a subsequent outcome. Many of the inherent challenges associated with analysing CD have been discussed previously, though not within a formal causal framework by utilising directed acyclic graphs (DAGs). We use DAGs to consider the specific issue of collider bias as it pertains to CD.

Methods We demonstrate how to depict CD using DAGs, and identify two distinct effect estimands in the generic case: (1) the ‘unbiased’ (total) effect, and (2) the ‘collider biased’ effect. We consider each effect in the context of three specific example scenarios involving CD with variable or fixed totals: (1) the relationship between the economically active population and area-level gross domestic product (GDP) (variable total); (2) the relationship between fat consumption and body weight (variable total); and (3) the relationship between time spent sedentary and body weight (fixed total). For each scenario, we consider the distinct interpretation of each effect, and the resulting implications for related analyses.

Results In scenario (1), the ‘unbiased’ effect represents the average change in GDP that results from adding economically active individuals to the area whilst doing nothing to the population of economically inactive individuals, whereas the ‘collider biased’ effect represents the average change that results from swapping economically inactive individuals for economically active ones. In scenario (2), the ‘unbiased’ effect represents the average change in weight that results from adding fat to an individual’s diet irrespective of other macronutrient consumption, whilst the ‘collider biased’ effect represents the average change that results in swapping ‘other’ macronutrient consumption for fat consumption. In scenario (3), only the ‘collider biased’ effect is estimable and causally meaningful; it represents the average change in weight that results from swapping time spent physically active for time spent sedentary.

Conclusion For CD with variable totals, both effects may be estimable and causally meaningful, depending upon the specific question of interest. Researchers should be clear about which effect is being sought and estimated, since they may be radically different quantities. For CD with fixed totals, only the ‘collider biased’ effect has any meaning. Careful attention must be paid to sensibly interpreting the relative effects that characterise this type of data.

  • causal inference
  • compositional data
  • methodology

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.