- Split View
-
Views
-
Cite
Cite
Miguel A. Hernán, Invited Commentary: Hypothetical Interventions to Define Causal Effects—Afterthought or Prerequisite?, American Journal of Epidemiology, Volume 162, Issue 7, 1 October 2005, Pages 618–620, https://doi.org/10.1093/aje/kwi255
- Share Icon Share
“It is philosophy, not science.” Physicists are familiar with this criticism of string theory, a theory that provides a unified description of all forces operating in the universe (1). Unlike philosophical arguments, scientific theories or their predictions need to be confirmed empirically. String theory involves an elegant set of mathematical equations; unfortunately, it is unclear whether its predictions are, or will be, testable—not a small obstacle for a “theory of everything.”
Some of the string theorists' tribulations regarding untestable predictions are shared by epidemiologists and other researchers who use counterfactual theory to draw causal inferences from observational data. I do not mean that epidemiologists are limited by the practical impossibility of conducting certain subatomic experiments. Rather, I refer to a fundamental shortcoming of causal inference from observational data in certain settings: the absence of a well-defined causal effect. When the causal effect of interest is ill defined, the counterfactual theory of causal inference from observational data and the elegant statistical methods derived from it lead to predictions that are untestable. Note that this problem is separate from the concept of bias or noncomparability between the exposed and the unexposed—confounding, selection bias, measurement error, random variability—that threatens the validity of the estimates of causal effects in epidemiologic studies. Ideally, comparability could always be achieved by randomization coupled with smart design and adequate funding.
The study by Haight et al. (2) in this issue of the Journal highlights the need for a sharp definition of the causal effect of interest in epidemiologic research. The authors analyzed data from 1,655 older California residents followed for some time between 1995 and 2001. The goal of the study was to estimate the joint causal effect of body composition and physical activity—the exposures—on several measures of functional limitation in the elderly. To accomplish this goal, the authors strived to 1) ensure approximate comparability (exchangeability) across levels of exposure and 2) use an analytic method (inverse probability weighting) that preserves comparability.
First, Haight et al. (2) collected data on a number of variables that may confound the causal effect of the exposure on the outcome. These variables included presence of chronic conditions, self-rated health status, body mass index, living arrangements, walking speed, and so forth (3). Although exchangeability can never be guaranteed in an observational study, for the purposes of this commentary let us assume that the authors succeeded at achieving approximate exchangeability.
Second, the authors noticed that the use of standard statistical methods could lead to loss of the exchangeability so laboriously obtained by careful design and data collection. This (selection) bias may appear in longitudinal studies with time-varying exposures when there are time-varying confounders and these confounders are affected by prior exposure (or share common causes with exposure) (4). Both conditions are met in the study by Haight et al. (2) because time-varying confounders such as health status are also affected by the exposure. To overcome this problem, Haight et al. used inverse probability weighting to estimate the parameters of a marginal structural model. Robins (5) developed methods based on inverse probability weighting in the context of counterfactual theory for causal inference from complex longitudinal data. Under the assumption of no model misspecification, these methods do not introduce selection bias even if the time-dependent confounders are affected by prior exposure. For example, in studies of the effect of antiretroviral therapy on the risk of acquired immunodeficiency syndrome, estimates from marginal structural models are closer to those obtained from randomized trials, while estimates from standard statistical models are greatly attenuated (6, 7). The two articles by Haight et al. and Tager et al. (3) are innovative in that they describe the first actual analysis of marginal structural models applied to data with a continuous exposure using methods proposed by Robins in the original paper on marginal structural models (5). Although the precise timing of the measurement of confounders and exposures is somewhat unclear (i.e., do the confounder values refer to a preexposure period?), the authors may clarify these issues in subsequent articles.
In short, Haight et al. (2) appropriately adjusted for the measured confounders to estimate the causal effect of exposure. But which causal effect did they estimate? To answer this question, let us briefly review the basics of marginal structural models. For dichotomous outcomes, these are models (e.g., logistic models) for the probability of the outcome if all individuals had a particular exposure history. The model is helpful in estimating causal effects because it allows one to compare the probability of the outcome if everybody had a certain exposure history with the probability of the outcome if everybody had another exposure history. We refer to the outcome of the model as counterfactual because it is defined under conditions contrary to fact; that is, in reality, not all subjects followed a given exposure history. Models for (functionals of) counterfactual outcomes are known as structural or causal models. For example, in the epidemiology of the human immunodeficiency virus, marginal structural models have been used to compare the rate of acquired immunodeficiency syndrome if all patients had received continuous antiretroviral therapy with the rate of acquired immunodeficiency syndrome if all patients had been kept untreated. Essentially, the use of these models is an attempt to replicate a randomized trial in which antiretroviral therapy is assigned at random and, in principle, the causal predictions from these observational human immunodeficiency virus studies can be confirmed (or rejected) by an adequately designed randomized experiment. Note the emphasis on the “in principle” part. Even though, in the real world, there are many obstacles to conducting such an experiment (ethical, financial, etc.), one can hypothetically ignore those obstacles and propose a protocol for a study that specifies the study population and the mechanism of exposure assignment.
Haight et al. (2) used a marginal structural model to model the probability of a counterfactual outcome (functional limitation or functioning history) under fixed exposure histories. One of the two exposures was physical activity. That is, the authors used a marginal structural model to compare the probability of the outcome had everybody been, say, highly active versus nonactive. Again, the validity of the authors' estimates can, in principle, be confirmed by a randomized experiment in which participants are randomly assigned to different levels of physical activity.
The second exposure is the ratio of lean body mass to fat mass, a measurement of body composition. This exposure is more problematic because the marginal structural model compared the probability of the outcome if everybody had, say, a high versus low lean-to-fat ratio. Confirming the model estimates would not be straightforward even if one had unlimited resources and no ethical constraints. Just think how you would design a randomized experiment in which participants are randomly assigned to either high or low lean-to-fat ratio. What kind of intervention would you specify in the study protocol? Extreme exercise? Starvation? Surgery? Genetic modification? All these interventions could effectively alter the lean-to-fat ratio, but it is improbable that all of them would lead to the same effect estimate when comparing the probability of the outcome had everybody had a high versus low lean-to-fat ratio. Two different procedures to affect the lean-to-fat ratio may lead to different causal effects even if, when counterfactually applied to a given individual, they would produce identical values of the lean-to-fat ratio. Thus, the counterfactual outcome under each exposure level is not well defined because the value of the counterfactual outcome depends on the actual intervention used to manipulate the exposure. The ambiguity of the causal effect is not a consequence of ethical constraints (the effect of cigarette smoking can be well defined even if some of the hypothetical interventions involved are harmful) or unfeasible interventions (the effect of long-term diet can be well defined even if some of the hypothetical interventions involved are impractical) but is much more fundamental.
Like string theorists, epidemiologists drawing causal inferences from observational data run the risk of using elegant mathematics (e.g., marginal structural models) whose predictions cannot be tested. However, it is a deeper problem for epidemiologists because ill-defined counterfactuals question the existence of the causal effect itself. What is the meaning of the English phrase “the effect of X on mortality” when X is replaced by body composition, low density lipoprotein cholesterol, CD4 cell count, forced expiratory volume in 1 second, or other physiologic parameters? The meaning depends on how one plans to manipulate, say, the CD4 cell count to compare the mortality under a low versus high CD4 count (e.g., use of antiretroviral therapy vs. physical filtering out of CD4 cells). The type of intervention matters so much that some authors have recommended restricting causal inference to the effects of specific interventions (8). On the other hand, a marginal structural model, like any other analytic method, is agnostic regarding this issue and produces an estimate anyway, even if such estimate is simply a mathematical construct with no meaningful correspondence in the real world.
One naive response to the presence of ill-defined causal effects would be replacing counterfactual theory with some other theory of causal inference. This strategy is not helpful because counterfactual theory is logically, and mathematically, equivalent to other theories that apparently do not use counterfactual outcomes to define and estimate causal effects. For example, structural equations/directed acyclic graphs (9) and decision-analytic methods (10) can be rewritten in terms of counterfactuals. The lack of alternatives to counterfactuals should come as no surprise because counterfactual theory is simply the formalization of natural human causal reasoning. In other words, counterfactual theory is not the problem but the tool that allows one to identify the problem. Refer to Robins and Greenland (11) and to Greenland (12) for additional discussion of the implications of vague or ill-defined counterfactuals.
The crucial question is then this: What is the point of estimating a causal effect that is not well defined? The resulting relative risk estimate will not be helpful to either scientists, who will be unable to relate it to a mechanism, or policy makers, who will be unable to translate it into effective interventions. One commonly heard argument is that epidemiologic studies are about association, not causation. According to this proposition, epidemiologists should not worry too much about fishy causal concepts but rather focus their efforts on estimating correct associations. This is certainly a safer strategy but also a dangerous one because it can make much of epidemiology close to irrelevant for both scientists and policy makers.
The alternative to retreating into the associational haven is to take the causal bull by the horns, as Haight et al. (2) bravely did by acknowledging the ambiguity of the causal effect in their Discussion section. A proper definition of causal effect requires well-defined counterfactual outcomes, that is, a widely shared consensus about the relevant interventions. Next time you present an effect estimate of your favorite exposure, consider proposing and discussing a relevant intervention as well. Otherwise, be ready to defend why epidemiologic causal inference should be considered science rather than philosophy.
Conflict of interest: none declared.
References
Greene B. The elegant universe: superstrings, hidden dimensions, and the quest for the ultimate theory. New York, NY: Vintage,
Haight T, Tager I, Sternfeld B, et al. Effects of body composition and leisure-time physical activity on transitions in physical functioning in the elderly.
Tager IB, Haight T, Sternfeld B, et al. Effects of physical activity and body composition on functional limitation in the elderly: application of the marginal structural model.
Hernán MA, Hernández-Díaz S, Robins JM. A structural approach to selection bias.
Robins JM. Marginal structural models. In: 1997 Proceedings of the Section on Bayesian Statistical Science. Alexandria, VA: American Statistical Association,
Cole SR, Hernán MA, Robins JM, et al. Effect of highly active antiretroviral therapy on time to acquired immunodeficiency syndrome or death using marginal structural models.
Sterne JAC, Hernán MA, Ledergerber B, et al. Long-term effectiveness of potent antiretroviral therapy in preventing AIDS and death: the Swiss HIV Cohort Study.
Pearl J. Causality: models, reasoning, the inference. Cambridge, United Kingdom: Cambridge University Press,
Dawid AP. Causal inference without counterfactuals (with discussion).
Robins JM, Greenland S. Comment on “causal inference without counterfactuals.”