Article Text

Download PDFPDF

The issue of confounding in epidemiological studies of ambient air pollution and pregnancy outcomes
  1. M J Strickland1,2,
  2. M Klein1,
  3. L A Darrow1,
  4. W D Flanders3,
  5. A Correa2,
  6. M Marcus3,
  7. P E Tolbert1
  1. 1
    Department of Environmental and Occupational Health, Rollins School of Public Health, Emory University, Atlanta, Georgia, USA
  2. 2
    National Center on Birth Defects and Developmental Disabilities, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
  3. 3
    Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, Georgia, USA
  1. Dr M J Strickland, Rollins School of Public Health, Department of Environmental and Occupational Health, 1518 Clifton Road NE, Atlanta, GA 30322, USA; mjstric{at}sph.emory.edu

Abstract

Background: Relationships between ambient air pollution levels during pregnancy and adverse pregnancy outcomes have been investigated using one of three analytic approaches: ambient pollution levels have been contrasted over space, time or both space and time. Although the three approaches share a common goal, to estimate the causal effects of pollution on pregnancy outcomes, they face different challenges with respect to confounding.

Methods: A framework based on counterfactual effect definitions to examine issues related to confounding in spatial, temporal, and spatial–temporal analyses of air pollution and pregnancy outcomes is presented, and their implications for inference are discussed.

Results: In spatial analyses, risk factors that are spatially correlated with pollution levels are confounders; the primary challenges relate to the availability and validity of risk factor measurements. In temporal analyses, where smooth functions of time are commonly used to control for confounding, concerns relate to the adequacy of control and the possibility that abrupt changes in risk might be systematically related to pollution levels. Spatial–temporal approaches are subject to challenges faced in both spatial and temporal analyses.

Conclusion: Each approach faces different challenges with respect to the likely sources of confounding and the ability to control for that confounding because of differences in the type, availability, and quality of information required. Thoughtful consideration of these differences should help investigators select the analytic approach that best promotes the validity of their research.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Since the mid-1990s, investigators have become increasingly interested in studying the effects of air pollution on pregnancy outcomes. As detailed in several reviews,16 outcomes such as preterm delivery, low birth weight, intrauterine growth restriction, spontaneous abortion and congenital malformations have been associated with ambient air pollution levels.

Overwhelmingly, the studies conducted to date have focused on associations between ambient air pollution levels and pregnancy outcomes. Information about relationships between personal exposure to air pollution during pregnancy and risks of adverse pregnancy outcomes is limited; reported effects have been small.7 8 However, even if the dose effects were large, the expectation for a measure of association based on ambient pollution levels may be attenuated if the measurement error incurred is non-differential with respect to the outcome.9 10 This measurement error can be substantial. Longitudinal correlation coefficients between ambient pollution levels and personal exposure measurements range between 0.5 and 0.7 for particulate matter ⩽2.5 μm in aerodynamic diameter;1114 correlations for gaseous pollutants tend to be weaker.1114 The attenuation of the risk ratio that may occur because of this measurement error, coupled with the small effects observed in the personal exposure studies, suggests that the causal effects of ambient air pollution on adverse pregnancy outcomes, if they exist, are likely to be small.7 8

The possibility that an observed association might be due to confounding constantly threatens the validity of observational research; in studies of ambient air pollution and pregnancy outcomes this concern is particularly relevant because the effect sizes are likely to be small. We present a framework based on counterfactual effect definitions to examine issues related to confounding in spatial, temporal and spatial–temporal analyses of air pollution and pregnancy outcomes, and we discuss their implications for inference.

COUNTERFACTUAL DEFINITION OF A CAUSAL EFFECT

Our exposition focuses on a cohort of pregnant women with a shared exposure. One such cohort is women who conceive on a particular day who live in a particular area. We assign all these women the same pollution metric, for example an average of ambient measurements during the first month of pregnancy. We refer to this cohort of women as the “target population”.15 Our goal is to estimate the causal effect of a change in ambient air pollution levels on the risk of an adverse pregnancy outcome in the target population.

We label the observed risk of the adverse pregnancy outcome in the target population, ignoring sampling error, as Risk, and the observed ambient air pollution level as Pollution. Inherent to the consideration of causality in this context is the question: “What would the risk in the target population have been if the pollution level had been Pollution* instead of its observed level?” We denote this “counterfactual” risk, which describes the risk in the target population under a hypothetical alternative condition that did not occur, as Risk*.15 A counterfactual definition for the causal effect of this difference in pollution levels (Pollution–Pollution*) is the difference in risks in the target population under the two exposure scenarios, that is RiskRisk*.15 16

To determine this causal effect requires knowledge of both Risk and Risk*. Because the counterfactual risk cannot be observed, data external to the target population are needed to estimate Risk*. For example, the observed risk for a different cohort of pregnant women exposed to ambient air pollution level Pollution* could be used to estimate Risk*. We refer to this second cohort of pregnant women as the “substitute population”, because the risk in this cohort is used to substitute for the parameter of interest.15 Confounding occurs when the risk in the substitute population imperfectly represents what the risk in the target population would have been under the hypothetical alternative pollution level.15 16

A counterfactual framework to discuss confounding

In studies of ambient air pollution and pregnancy outcomes, women may be assigned pollution levels that vary over space, time or both space and time. Although analysis typically proceeds by analysing all women together, we contend these studies can be envisioned as a series of contrasts between a target population (or individual) and a substitute population (or individual).

Risks of an adverse pregnancy outcome for four mutually exclusive cohorts of pregnant women, defined with respect to location and time, are presented in table 1. Risk11 and Risk21 are risks for two cohorts at different locations at one point in time. Similarly, Risk11 and Risk12 are risks for two cohorts at the same location at different times. Risks for two cohorts that differ with respect to both location and time are denoted by Risk11 and Risk22.

Table 1 Risks of an adverse pregnancy outcome for four mutually exclusive cohorts of pregnant women

Assume the cohort at Location 1 and Time 1 is the target population. Because ambient air pollution levels vary across space and time, any of the other cohorts in table 1 could be used as the substitute population, and Risk21, Risk12 or Risk22 could be used to estimate the counterfactual risk in the target population. Unfortunately, none of these populations will likely be a perfect substitute for the target population, and confounding will be a concern. Therefore, the choice of Risk21, Risk12 or Risk22 as the estimate of the counterfactual risk should be based on the investigator’s ability to compensate for differences between the two populations in the analysis. For each scenario, we describe challenges commonly encountered when analytic techniques are used to account for differences between the target and the substitute populations, and we discuss how the presumably small effect of ambient air pollution on the pregnancy outcome influences interpretation of the effect estimate.

CONFOUNDING IN SPATIAL ANALYSES

A spatial analysis contrasts pollution levels between populations in different locations at a given point in time by using Risk21 as the counterfactual risk estimate (table 1). Examples of spatial analyses include Vassilev et al,17 Wilhelm and Ritz18 and Huynh et al.19 In spatial analyses, confounding occurs when the risk in the substitute population imperfectly represents what the risk in the target population would have been under the hypothetical alternative pollution level, for example if socioeconomic status, which affects risk, differs between the two locations of interest. To validly estimate this causal effect, risk factors that differ between the target and substitute populations must be appropriately accounted for in the analysis. Practical challenges can arise when analytic methods are used to account for confounding.

One such challenge is residual confounding, which can occur if confounders are measured with error. In most investigations, birth certificates have been used as the primary data source for information about the outcome and potential confounders.16 Much information contained in birth certificates is subject to measurement error, with the validity of data pertaining to tobacco use, alcohol use, prenatal care, maternal risk factors, pregnancy complications and delivery method generally considered to be poor for US birth certificates.20 21 Many adverse pregnancy outcome risk factors are unlikely to be uniformly distributed across space, as evidenced by the disparities that exist across US urban populations with respect to proximity to environmental hazards according to factors such as race, socioeconomic status, education and health insurance status.22 23 Because risk factors may be spatially correlated with ambient air pollution levels, investigators should be concerned about residual confounding when variables on birth certificates (that might be measured with error) are relied upon to compensate for differences between target and substitute populations in an analysis. The impact of residual confounding has been described, for example if the sensitivity and specificity of a dichotomous confounder are both 0.90, only 64% of the confounding is expected to be removed.24

Information on several potential confounders may not be available on birth certificates; of these, socioeconomic status is perhaps the greatest concern.23 Commonly available individual-level variables, such as race and education, are unlikely to capture the full construct of socioeconomic status.25 Controlling for neighbourhood-level socioeconomic status variables may not capture potentially important within-neighbourhood variation (eg, in many urban US neighbourhoods, houses located on highly trafficked roads tend to have lower resale values than similar houses on roads with less traffic).26 27 Other unmeasured risk factors could also vary across locations and potentially confound the estimated effect of air pollution in a spatial analysis. For example, health-conscious women may be more averse to living near visible environmental hazards such as automobile traffic, high-tension wires or waste sites. These women may be more likely to engage in other behaviours that would reduce their risk of an adverse pregnancy outcome (eg, exercise, diet, vitamin use, prenatal care).

Although residual confounding and weak uncontrolled confounding are common concerns in observational research, we believe they are particularly relevant in this setting, because accurate measurements of all confounders are usually not available, and because the causal effects of ambient air pollution, if they exist, are likely to be small. Although an elevated risk ratio is compatible with a true pollution effect, it is also compatible with an estimate biased away from the null because of residual confounding or weak uncontrolled confounding. It seems prudent to be concerned about confounding in this setting; the imperfect validity of data on US birth records is well known,20 and perfect control of socioeconomic status using birth certificate data seems unlikely.25 Consistency of results does not necessarily rule out confounding as a plausible explanation, because confounding can be similar across studies.

CONFOUNDING IN TEMPORAL ANALYSES

Temporal analyses contrast pollution levels over time between populations at a particular location by using Risk12 as the counterfactual risk estimate (table 1). Examples include Gouveia et al,28 Sagiv et al29 and Hansen et al.30 As is true for spatial analyses, valid estimation of the causal effect requires proper control for risk factors that differ between the target and the substitute populations.

In temporal analyses, smoothing functions (eg, parametric splines and nonparametric smoothers) can be used to control for confounding by risk factors that change gradually over time. Many risk factors change smoothly over time, for example long-term trends in demographics, healthcare, smoking, use of assisted reproductive technologies and prenatal vitamin use. When a smooth function of time is included in a regression model, successful control of confounding will depend on how well that function of time serves as a proxy for unmeasured time-varying risk factors. In these analyses, concerns about confounding often centre on abrupt, unmeasured changes in risk which are not accounted for in the analysis. Residual confounding, which is also a concern, can occur when the smoothing function does not fully account for the long-term and seasonal variations in risk or if risk factors with short-term variability are measured with error.

To confound a temporal analysis that adequately controls for gradual trends, a risk factor with abrupt temporal variation must be systematically associated with air pollution levels. Otherwise, the risk factor will simply add to uncertainty and reduce power to detect an association. An infectious disease outbreak is a short-term change that could increase risks of adverse pregnancy outcomes. Some infections during pregnancy increase risks of specific outcomes (eg, cytomegalovirus infection is associated with intrauterine growth restriction and congenital anomalies),31 whereas others plausibly increase risk (eg, influenza and respiratory illness cause inflammation, and inflammation is a risk factor for preterm delivery).32 Many infectious diseases are seasonal;33 furthermore, within a particular season, outbreaks may be more likely to occur during bouts of cold weather, when people spend more time indoors and interpersonal contacts are increased.34 Since temperature impacts the concentrations of many air pollutants,35 such outbreaks could confound the association of interest. A disaster of any type could increase risks as well, for example because of changes in maternal stress; exposure to toxic agents; or disruptions in the availability of food, water or medical services. If short-term changes in risk, such as those resulting from an infection or disaster, are systematically associated with ambient air pollution levels, then the association of interest will be confounded. These risk factors can also confound spatial analyses, for example if the event disproportionately affected people in areas that had relatively high (or low) air pollution levels.

Irrespective of the analytic approach, most investigators have examined ambient air pollution levels averaged over a few weeks or months during pregnancy. If a smooth function of time is used to account for confounding by factors with seasonal and long-term variation, this control may intrude upon the gestational window of interest (eg, one pregnancy trimester is as long as one season). A smooth function of time that does not control for seasonal trends could be implemented; however, seasonal birth patterns may vary by socioeconomic status, as individuals occupying low socioeconomic positions tend to have fewer spring births.36 37 This is problematic, because air pollutants have seasonal variation, and the risks of several adverse pregnancy outcomes are related to socioeconomic status.38 Confounding by socioeconomic status may therefore be a concern for temporal analyses that do not control for season. Unfortunately, controlling for season may remove residual variability in pollution levels that is of interest. If so, the likely consequence is a loss of statistical power, which occurs because the target and substitute populations have similar pollution levels. This trade-off is characteristic of temporal analyses; the most satisfying substitute population (ie, one which is similar to the target population with respect to location and season) is likely to be similar to the target population with respect to the ambient pollution level as well.

CONFOUNDING IN SPATIAL–TEMPORAL ANALYSES

Among published studies of relationships between ambient air pollution and adverse pregnancy outcomes, the most common analytic approach has been spatial–temporal.16 Examples include Bobak,39 Maisonet et al,40 Slama et al41 and Ritz et al.42 In these analyses, target and substitute populations differ with respect to location, time or both location and time. Spatial–temporal analyses are appealing because they utilise variation in air pollution levels over both space and time, thereby offering the potential for improved statistical power relative to either a spatial analysis or a temporal analysis. A major disadvantage of spatial–temporal analyses, however, is that all of the previously described concerns about confounding for both spatial analyses and temporal analyses pertain to spatial–temporal analyses. Consequently, the potential for confounding in a spatial–temporal analysis is greater than in either a spatial or temporal analysis.

DISCUSSION

We have described issues related to confounding for analyses of ambient air pollution and adverse pregnancy outcomes. No analytic approach precludes confounding, and in practice it is impossible to know if a particular association (or lack thereof) is confounded. Compensating for differences between target and substitute populations is difficult, and given that the true effects of ambient air pollution on the risks of adverse pregnancy outcomes, if they exist, are likely to be small, concerns that study results might be confounded should be anticipated. Although we cited specific studies for each analytic approach, we have refrained from highlighting the plausibility of confounding in any particular study, as our goal is not to critique but rather to describe the conceptual issues that relate to confounding in epidemiological studies of air pollution and adverse pregnancy outcomes.

In temporal analyses, smooth functions of time are commonly used to control for confounding by risk factors with gradual trends. Although residual confounding is a concern, smoothing functions seem well suited to account for these trends. Ideally, risk factors with abrupt temporal variation should be measured and controlled for; if these fluctuations are associated with abrupt changes in air pollution levels, then the association of interest will be confounded. Seasonal control may entail a trade-off between controlling for potential confounding and statistical power, and investigators should consider this when planning the analysis.

Spatial analyses usually rely on measured risk factors to compensate for differences between the target and substitute populations. Confounding can occur if risk factors that are correlated with pollution levels are unmeasured or measured with error. Many studies have relied on birth certificates, which contain information on a limited number of risk factors, some of which are likely measured with error. Supplementary data collection may be useful; in one recent study, investigators collected additional information on risk factors for preterm birth.42 The authors did not find that these risk factors confounded associations between ambient air pollution and preterm birth.42 However, even with additional information, full control for the effects of many plausible confounders, such as socioeconomic status or maternal health consciousness, may be difficult.

As an alternative to measured risk factors, a smooth function of location could be used to control for confounding in spatial analyses.43 As in temporal analyses, the adequacy of this approach depends on the smoothness of the variation in risk; residual confounding might be present if risks change abruptly from one location to the next. Contemplation of the smoothness of the variation in risk, and of the likelihood that abrupt changes in risk are correlated with ambient pollution levels, is useful for assessing the plausibility of confounding. In our opinion, ambient air pollution levels are generally more likely to be correlated with abrupt spatial changes in risk than abrupt temporal changes in risk. Whereas overall differences between US urban neighbourhoods might be well characterised using a smooth function of location, it would be challenging to capture potentially important within-neighbourhood differences, such as abrupt changes in socioeconomic status according to residential proximity to traffic.26 27 Conversely, for temporal analyses, it is more difficult to envision plausible scenarios in which abrupt temporal changes in risk would be systematically associated with ambient air pollution levels (apart from a natural or manmade disaster). For example, influenza outbreaks might cause short-term increases in risk. Provided that season is appropriately controlled for in the analysis, there is little reason to suspect that the outbreaks would be systematically correlated with ambient air pollution levels.

The three analytic approaches we have described share a common goal, to estimate the causal effects of pollution on pregnancy outcomes, and share a common need, to adequately control for confounding. Each approach faces different challenges with respect to the likely sources of confounding and the ability to control for that confounding due to differences in the type, availability and quality of information required. Thoughtful consideration of these differences should help investigators select the analytic approach that best promotes the validity of their research.

What is already known on this subject

  • There is a growing body of evidence suggesting that ambient air pollutants may harm the developing fetus.

  • In previous studies, analyses have been based on contrasts in air pollution levels over space, time, or both space and time.

  • Although the aim of these studies is to estimate the causal effects of pollution on pregnancy outcomes, adequately controlling for confounding presents a challenge.

What this study adds

  • This paper presents a framework based on counterfactual effect definitions to examine issues related to confounding in spatial, temporal and spatial–temporal analyses of ambient air pollution and adverse pregnancy outcomes.

  • This framework may help investigators to select the analytic approach that best promotes the validity of their research.

Acknowledgments

We thank Katherine Hoggatt for her helpful comments.

REFERENCES

Footnotes

  • Competing interests: None.

  • Funding: National Institute of Environmental Health Sciences grant R01-ES012967-01A1 and Health Resources and Services Administration grant T03MC07651.

  • Disclaimer: The findings and conclusions in this report are those of the author(s) and do not necessarily represent the official position of the Centers for Disease Control and Prevention.