Article Text

Download PDFPDF
P58 Assessing the utility of multilevel versus ecological analyses to obtain individual-level causal effect estimates
  1. MS Gilthorpe1,
  2. L Kakampakou2,
  3. J Stokes3,
  4. A Hoehn3,
  5. M de Kamps4,
  6. W Lawniczak4,
  7. KF Arnold5,
  8. A Heppenstall6
  1. 1Obesity Institute, Leeds Beckett University, Leeds, UK
  2. 2Department of Mathematics and Statistics, Lancaster University, Lancaster, UK
  3. 3MRC/CSO Social and Public Health Sciences Unit, University of Glasgow, Glasgow, UK
  4. 4School of Computing, University of Leeds, Leeds, UK
  5. 5EMEA, IQVIA, Leeds, UK
  6. 6School of Social & Political Sciences, University of Glasgow, Glasgow, UK

Abstract

Background Government bodies, private enterprises, and researchers increasingly use ‘big data’ to monitor, evaluate interventions, make future predictions, and seek causal understanding. Such data are often complex in structure (i.e., hierarchical), which creates challenges for methods that work for a single homogeneous population, but which mislead if applied to data with substructure. If causal insights are sought, this usually pertains to the individual, yet most datasets are aggregated due to issues surrounding sensitive personal information, which is why it is common to encounter simulation approaches, such as agent-based modelling (ABM), or ecological analyses that evaluate only marginal (i.e., clustered) information. Contemporary causal inference methods are yet to tackle the full complexities of multilevel data structure, beyond longitudinal repeated measures. There is thus a gap in our understanding and methods capabilities surrounding causal analysis of structured data, which this study examines.

Methods 1) devise a hierarchical causal diagram that encodes a multilevel data generating mechanism with prespecified cross-level causal relationships; 2) simulate multilevel data from the causal diagram and obtain aggregated data; 3) contrast multilevel and ecological estimates of a simulated individual-level causal effect, to assess the presence and extent of potential biases.

Results Unlike a multilevel analysis of the full data, ecological analyses of cluster-level data do not generally yield robust causal effect estimates. While it is known that ecological analyses invoke the ‘ecological fallacy’ (i.e., where attributing features of clusters to units within clusters may mislead), this study quantifies this for the first time within a formal causal framework. An algorithm to simulate causally structured multilevel data is also demonstrated.

Conclusion Insights into the limitations of common analytical practices were made possible by simulating causally structured hierarchical data, demonstrating the value of causal diagrams in both simulation and causal analysis. Methodological challenges remain for robust causal evaluation of big data, but this study shows how to investigate these challenges. Results reveal the need for individual-level data with application of multilevel analyses to achieve robust causal inquiry; ecological analyses do not generally provide sound causal effect estimation. If individual-level data are unavailable, synthetic data (informed by available marginal data) becomes necessary to answer causal questions and this study provides a tool to generate synthetic population data that reflects multilevel causal structures, which in turn will then better inform the use of methods such as ABMs. This study has enormous implications for the use of big data when seeking causal insights.

  • causal inference
  • simulation
  • multilevel modelling.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.