Article Text
Abstract
Background Compositional data comprise ‘parts’ of a ‘whole’ (or total), where the parts sum to the whole. In compositional data with fixed totals (e.g. hours within a day), only relative causal effects can be estimated because the effect of increasing one component (e.g. time spent physically active) cannot be distinguished from the effect of decreasing one or more other components (e.g. time spent sedentary).
Compositional data are not well understood, but the structure has recently been conceptualised using directed acyclic graphs (DAGs) with deterministic nodes. This work encourages the use of a simple well-established approach, known as the isotemporal (‘leave-one-out’) model, for estimating relative causal effects in compositional data.
However, the isotemporal model has been criticised as unsuitable in the presence of non-linear effects. Other, more technically demanding approaches, known as Compositional Data Analyses (CoDA) methods, are promoted instead.
This study is the first to investigate the performance of DAG-informed regression-models for estimating causal effects in compositional data with fixed totals in simulated data, where the ground truth is known.
Methods Using the DagSim package in Python, we simulated compositional data with fixed totals, using the example of physical activity data, in which sleep, sedentary behaviour (SB), light physical activity (LPA), and moderate and vigorous physical activity (MVPA) sum to a fixed total of 24 hours. The time spent in each state was then simulated to contribute to levels of an outcome (fasting plasma glucose, FPG), either in a strictly linear manner, or through non-linear relationships. We assessed the performance of using the DAG-informed isotemporal approach by comparing model estimates to the known (simulated) true relative causal effect of each component on the outcome.
Results Accurate relative causal effect estimates were obtained using the DAG-informed isotemporal approach, provided the models were parameterised correctly. When the model was not parameterised correctly, e.g. linear terms were used for modelling non-linear relationships, the estimates were biased. In the literature, the isotemporal model is used almost exclusively with linear terms, which might explain some of the previous misconceptions that it is unsuitable for modelling of compositional data.
Conclusion In compositional data with fixed totals, a simple DAG-informed isotemporal modelling approach recovers the true relative causal effect as long as any non-linear relationships are appropriately parameterised. This method is a viable alternative to the more technically challenging and specialist CoDA methods. The findings cannot be generalised to compositional data with varying totals, which require separate investigation.