Article Text

Download PDFPDF

Using natural experimental studies to guide public health action: turning the evidence-based medicine paradigm on its head
  1. David Ogilvie1,
  2. Jean Adams1,
  3. Adrian Bauman2,
  4. Edward W. Gregg3,
  5. Jenna Panter1,
  6. Karen R. Siegel4,
  7. Nicholas J. Wareham1,
  8. Martin White1
  1. 1 MRC Epidemiology Unit and Centre for Diet and Activity Research (CEDAR), University of Cambridge, Cambridge, UK
  2. 2 Charles Perkins Centre and Prevention Research Collaboration, University of Sydney, Sydney, New South Wales, Australia
  3. 3 School of Public Health, Imperial College, London, UK
  4. 4 National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
  1. Correspondence to Dr David Ogilvie, MRC Epidemiology Unit and Centre for Diet and Activity Research (CEDAR), University of Cambridge, Cambridge, UK; david.ogilvie{at}


Despite smaller effect sizes, interventions delivered at population level to prevent non-communicable diseases generally have greater reach, impact and equity than those delivered to high-risk groups. Nevertheless, how to shift population behaviour patterns in this way remains one of the greatest uncertainties for research and policy. Evidence about behaviour change interventions that are easier to evaluate tends to overshadow that for population-wide and system-wide approaches that generate and sustain healthier behaviours. Population health interventions are often implemented as natural experiments, which makes their evaluation more complex and unpredictable than a typical randomised controlled trial (RCT). We discuss the growing importance of evaluating natural experiments and their distinctive contribution to the evidence for public health policy. We contrast the established evidence-based practice pathway, in which RCTs generate ‘definitive’ evidence for particular interventions, with a practice-based evidence pathway in which evaluation can help adjust the compass bearing of existing policy. We propose that intervention studies should focus on reducing critical uncertainties, that non-randomised study designs should be embraced rather than tolerated and that a more nuanced approach to appraising the utility of diverse types of evidence is required. The complex evidence needed to guide public health action is not necessarily the same as that which is needed to provide an unbiased effect size estimate. The practice-based evidence pathway is neither inferior nor merely the best available when all else fails. It is often the only way to generate meaningful evidence to address critical questions about investing in population health interventions.

  • evaluation
  • natural experimental studies
  • non-randomised studies
  • practice-based evidence
  • public health policy

This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See:

View Full Text

Statistics from


Governments around the world are committed to tackling the growing burden of non-communicable diseases.1 Unhealthy patterns of behaviours such as smoking, diet, alcohol consumption and physical activity contribute substantially to disease risk and life expectancy, particularly in middle-income and high-income countries.2 The populations of lower-income countries are increasingly at risk of similar behaviour patterns as their economies develop. For example, insufficient physical activity is already estimated to account for nearly 10% of global premature mortality.3 The energy imbalance and metabolic health of people in lower-income countries are liable to worsen with the increasing availability of high-energy ultraprocessed foods and mechanisation of labour and transport that characterise an aspirational ‘Western’ lifestyle.

The aetiological associations between behaviours and many chronic disease outcomes are sufficiently well established to justify efforts to ameliorate behavioural risk factors. However, how population behaviour patterns might most effectively be shifted remains one of the greatest uncertainties for public health research and policy. To date, effort has largely been directed at developing and evaluating interventions to change the behaviours of individuals at higher risk, sometimes successfully. However, it is doubtful that merely scaling up these approaches to reach more and more people would be affordable, effective or equitable as a global disease prevention strategy.2 4–6 A more authentic and sustainable population-based strategy would complement the current focus on effective primary and secondary prevention—targeting individuals at higher risk—with more primordial prevention (table 1) that addresses the environments and policies that shape the circumstances in which we live. This vision has deep roots in the histories of public health and medicine.7 However, we remain largely ignorant about how to achieve it. This ignorance reflects the challenges in evaluating primordial prevention strategies of this kind, interpreting the findings and translating them into action. These challenges are reflected in the conclusions of recent systematic reviews, as exemplified in table 2.

Table 1

Glossary of selected terms

Supplementary data

Table 2

Key findings of recent examples of systematic reviews

In this paper, we discuss the growing importance of evaluating natural experiments in primordial prevention and their distinctive contribution to generating evidence for public health policy. We identify some of the obstacles to this type of research and suggest greater effort and investment in this area to ensure that research more effectively supports public health action.

What do we need to know?

We should reflect on the extent to which our research is aligned with the societal processes it is intended to inform.8 To that end, we must distinguish between ‘behaviour change’ as a population goal on the one hand, that is, the outcome we ultimately wish to achieve; and on the other, ‘behaviour change’ as an intervention strategy or moral imperative, that is, a way of framing both the problem (people are making poor behavioural choices) and the solution (they need to make different choices). An approach too narrowly focused on people and their behaviour or lifestyle as the problem, and thereby on interventions that often seek ‘to persuade the poor to change their behaviour’,9 is not compatible with a social ecological understanding of the causes of ill health that are not amenable to individual control.2 10

The evidence available to guide policy has long been subject to an ‘evaluative bias’ in favour of behavioural interventions targeting people at higher risk because such interventions are generally easier to evaluate and, in particular, easier to randomise.11 This type of evidence tends to overshadow that for strategies that act on whole populations by targeting critical leverage points in the systems that generate and sustain less healthy behaviour patterns.2 12 This implies a need to direct greater policy and research attention to where the underlying problems are located: not merely among individuals at higher risk, nor even among groups of more deprived individuals, but in the more fundamental causes of ‘dis-ease’ (sic)13 in communities—for example, in the unhealthy environments created as a consequence of the structural conditions of the planning, transport and welfare systems and the housing and labour markets.10

Why do we not know?

The lack of evidence for effective primordial prevention strategies may be traced to one of three types of obstacle.

The first is a set of political obstacles encountered by researchers who are willing but unable to produce the evidence. Researchers seeking to evaluate environmental or policy interventions—such as improving access to green space or taxing particular foods—depend on governments or other agencies to implement evaluable strategies.6 Because these interventions often entail greater political cost or risk than those focused on individual choice, they tend to be introduced less often. When such policies do find favour, demonstration projects and similar initiatives are often introduced quickly, without time to establish rigorous evaluation studies.14 Even if an intervention is both promising and evaluable in principle,15 an agile evaluative response may depend on more rapid and flexible sources of funding than have traditionally been available.

The second is a set of cultural obstacles in research, manifested by a research community that is able in principle to produce the evidence, but rendered somewhat unwilling by circumstances. Nearly two decades ago, it was pointed out that only a small fraction of UK public health research expenditure was directed towards ‘solutions’.16 Today, observational epidemiology and the development and evaluation of targeted behaviour change interventions remain easier and more secure routes to ‘doing something’, achieving funding, producing publications and career progression.12 17 A research community that, quite understandably, ‘follows the money’ in this way may therefore be distorting the agenda in research (and, consequently, in policy).

The third is a set of practical obstacles. Primordial preventive strategies are generally implemented as ‘natural’ or ‘quasi-’ experiments rather than ‘true’ experiments (table 1).18 Evaluating these strategies thereby makes for a more complex and unpredictable undertaking than, for example, a typical clinical trial—which is not without enormous potential challenges of its own. This calls for a particularly flexible and nuanced approach to natural experimental study design and analysis, along with sufficient capability and capacity to deliver this.6 No wonder, then, that it seems more common to see papers calling for this type of research than to see papers reporting it.

Two complementary modes of evidence generation

Much has been written about the bench-to-bedside translational medicine pathway linking basic science with clinical practice. That concept has also strongly influenced thinking about evidence to support public health action, for example, in discourse that refers to institutionalising ‘proven’, ‘evidence-based’ interventions.19 It envisages a largely unidirectional pipeline in which researchers—informed by observational studies—develop interventions, subject them to feasibility and pilot testing, and then evaluate them in definitive randomised controlled trials (RCTs). These are often conducted in settings and groups of people unlike those in which a public health intervention might ultimately be applied. When a systematic review of multiple trials concludes that an intervention is effective, that intervention is regarded as ‘proven’ and may be recommended by a body such as the National Institute for Health and Care Excellence (NICE: for more widespread implementation, subject to broader contextual considerations such as affordability and political acceptability. This is shown in the upper pathway of figure 1. In this pathway, the purpose of evaluation can be seen as indicating whether a traffic light holding back an ‘unproven’ intervention should be turned from red to green.

Figure 1

Two complementary modes of evidence generation.

However, this implied linear, rational way in which new knowledge is converted into ‘evidence-based practice’ has limited empirical support or practical utility for ‘upstream’ public health interventions.19 Even in comparatively well-resourced healthcare systems, major preventive initiatives such as cervical screening and routine health checks for over-40s have been introduced without evidence of effectiveness from RCTs to support them.4 20 This applies even more outside healthcare, where actions influencing complex systems of wider determinants of health such as food supply, income and urban planning are being taken all the time. These actions occur for a variety of reasons that may or may not be ostensibly concerned with health, and with or without evidence to support them or meaningful evaluation to learn from them.2 6 For example, when a new neighbourhood is built to accommodate unmet need for housing, planners make decisions about the mix of land uses, amenities provided (such as schools and parks) and street network layout. Each of these decisions constitutes an intervention that may influence physical activity and the risk of chronic disease among the people who live there.21 It is, however, unlikely to be realistic for governments to take no action until ‘sufficient’ evidence of all such effects have been cumulated and synthesised from multiple intervention studies.22 23

Public health research is concerned with ‘generating discoveries and new knowledge within the public health field itself’.8 This implies an opportunity and need to complement the evidence-based practice pathway, described above, with innovative solutions—generated in and for the real world by policymakers and practitioners—that can also be rigorously evaluated to produce ‘practice-based evidence’ (as shown in the lower pathway of figure 1).5 In this latter pathway, the purpose of evaluation can be seen as more akin to adjusting the compass bearing followed by existing policy rather than enabling a binary decision to proceed or not with the widespread implementation of a particular intervention. Whereas the former pathway depends on multiple instances of evidence of effectiveness to justify action, the latter depends more on multiple instances of action from which to develop at least preliminary evidence of effectiveness. This may in turn support taking further evaluable action and the consequent, cumulative reduction of uncertainty about its effects.

The application of each step of the practice-based evidence pathway, and how it differs from the converse pathway, can be illustrated with a worked example based on the published protocol for the ongoing evaluation of the UK soft drinks industry levy,24 a fiscal measure intended to reduce consumption (table 3). This particular intervention had not been implemented before, so the case for action was not based on established evidence of effectiveness as such. Rather, it rested on a plausible25 case for effectiveness based on a combination of observational epidemiology, simulation modelling and a limited set of evaluations of related interventions in other parts of the world. Without a political decision to ‘do something’ based on such ‘non-randomised’ evidence, it would be impossible ever to generate stronger evidence about effectiveness. Thoughtful assessment of the evaluability of the intervention revealed the complexity of its theorised mechanisms and potential outcomes, along with the unfeasibility of imposing an RCT design on this particular fiscal policy measure in this particular context. This dictated a natural experimental evaluation using a combination of most appropriate methods (eg, interrupted time series analysis) to systematically rule out alternative explanations for observed effects and demonstrate credible causal pathways leading to those effects.25 In contrast to the reliance on successful randomisation for causal attribution in an RCT, this study relies on integrating the findings of multiple quantitative and qualitative components for deriving robust inferences. These will contribute to subsequent evidence synthesis, as much in terms of validating (potentially generalisable) overall intervention theory as in terms of producing (context-specific) effect size estimates for meta-analysis.6 26 Such findings can be used to adjust existing policies, or inform future actions around the world, in this area to optimise their health outcomes.

Table 3

Natural experimental evaluation of the UK treasury soft drinks industry levy

Towards a good enough evidence base for public health action

How, then, might effort and investment in developing the published evidence base more effectively support the kinds of policy intervention required for primordial prevention? Our analysis suggests three main implications; examples of potential actions arising from these are given in table 4.

Table 4

Examples of potential actions to help develop the evidence base

Intervention studies should focus on reducing critical uncertainties

Studies of interventions to change upstream determinants of disease risk, such as population dietary or commuting patterns, are sometimes criticised because they have not followed participants to ultimate, ‘hard’ physiological or clinical endpoints. However, there is no reason to expect that all parts of a putative causal chain should be directly proved within a single study. It has been argued that to inform action, public health intervention research should be guided as much by a decision-theoretical approach (table 1) as by the narrower, but more familiar, statistical hypothesis-testing approach.27 This implies that evaluation should focus on reducing the most critical uncertainties28 about what should be done—that is, the ways in which various intervention strategies influence population behaviour patterns—just as we judge smoking cessation services not on their direct impact on heart disease or lung cancer, but on whether they help people quit smoking.29 A case for (or against) action can be gradually cumulated using the iterative exchange of data and theory between empirical observational and intervention studies in a variety of contexts, simulation modelling of more distal or long-term health impacts and other sources of evidence.30

Non-randomised study designs should be embraced rather than tolerated

Although natural experimental study designs have important theoretical underpinnings in common with the RCT, their worth does not reside solely in the extent to which they emulate an RCT design. Dunning proposes three criteria for assessing the utility of natural experimental studies.31 The first is that the allocation of an intervention can be treated ‘as ifit were random, although not within a planned RCT. Although randomisation eliminates important sources of potential confounding, an expectation that intervention studies should entail a comparable allocation process (such as a lottery) would further entrench existing evaluative biases because many interventions relevant to public health are never likely to fulfil this criterion.22 This may be because randomisation is impractical (eg, new transport infrastructure is built in particular places for particular reasons) or politically unpalatable (if, eg, it is seen as withholding a service from certain areas or groups).12 Furthermore, intervention studies that ‘fail’ this criterion may pass with flying colours on Dunning’s other two criteria for utility. One of these relates to the relevance of the intervention to current, real-world policy questions. A key advantage of natural experimental studies is that they ‘do not interfere in the natural data generation process’,32 and thereby largely avoid the problems of ‘artificial and less directly informative’ inferences from effects observed in experimental studies in more controlled settings.33 The other criterion relates to the plausibility of the causal inference.25 Here again, a natural experimental study may be ‘more likely to generate causal evidence that applies to intervention implementation in real life’,34 particularly if it elicits evidence of how an intervention achieves its effects.27

Of course, this may appear to sit uneasily within a research funding system based on a biomedical paradigm that privileges the RCT above all other methods for establishing effectiveness.35 But randomisation does not necessarily hold the key to unlocking questions about public health action.25 Nor does the proliferation of epidemiological studies that link environmental exposures with health behaviours in a statistically robust way but are incapable of testing whether altering the former influences the latter.21 36 If a given method or study design is chosen for its alignment with the applied research question and executed in a rigorous and transparent way, it is likely to contribute important evidence even though (and perhaps because) it falls into the implicitly disparaging category of ‘non-randomised’ studies.35

A more thoughtful approach to appraising the utility of evidence

This is not to deny that many non-randomised studies do have major limitations and are reported in ways that lack rigour or transparency. For example, systematic reviews of studies linking changes in the built environment with changes in diet, physical activity and adiposity have noted multiple potential sources of bias and that ‘studies with weaker designs were more likely to report associations in the positive direction’.23 37 In addition to all the issues that complicate the practice and interpretation of trials, in a natural experimental study close attention needs to be paid to understanding exactly what exposure to an intervention consists of; how an intervention comes to be assigned to some people, groups or areas and not others; finding a valid basis for estimating the counterfactual, such as by using a meaningful control group or a graded measure of intervention exposure; selecting and interpreting the adjustment for appropriate covariates to minimise the risk of confounding; and interpreting complex patterns within the outcomes, which may include divergent and potentially inequitable responses between subgroups, dose-response relationships and comparisons with multiple controls.18 38 39

We have well-established, and continually developing, catechisms for assessing the internal validity of intervention studies, and groups of studies, in health research.40 However, we lack clear consensus on the relative importance or interpretation of different aspects of internal validity in natural experimental studies, and therefore on how to make constructive use of an evidence base that fits poorly into existing appraisal systems.6 For example, current tools for assessing risk of bias appear predicated on a preference for studies that resemble an RCT as closely as possible.22 23 They tend to downplay or ignore the importance of ‘greater qualitative appraisal (and) theoretical and statistical knowledge’,32 and of what different quantitative and qualitative components of single or multiple studies might contribute in combination to a growing body of overall, more generalisable causal inference.31 33 In particular, we lack consensus on ‘how good is good enough’—which partly depends, of course, on the answer to the question ‘good enough for what?’ The complex evidence needed to guide public health action is not necessarily the same as that which is needed to provide an unbiased estimate of an effect size.


We are more likely to halt the rise in the global prevalence of non-communicable diseases by taking and evaluating new, more ambitious or radical actions to address the underlying causes than by merely applying existing preventive approaches—even if these are effective—with greater intensity. Even apparently simple questions about effectiveness in this arena cannot be answered without action—although based on the best available evidence at the time—that necessarily precedes evaluation. The practice-based evidence pathway can be regarded as an essential, and currently under-resourced and undervalued, complement to the more established evidence-based practice pathway. It is neither inferior nor merely the best available when all else fails. On the contrary, it is often the only way to generate meaningful evidence to address critical questions about investing in population health interventions.

The two pathways for generating evidence described in this paper do not represent mutually exclusive approaches. Some policy and practice innovations could and should be evaluated in RCTs, and many more would benefit from more planned evaluation using a wider range of study designs. Nevertheless, the public health research community and those who fund and publish their work have key roles to play in supporting the development and credibility of researchers in this field, and the more thoughtful conduct, appraisal and synthesis of natural experimental studies, to populate critical missing pieces of the evidence base to support more effective public health action.

What is already known on this subject

  • There are well-established associations between behaviour and chronic disease, which justify government efforts to reduce behavioural risk factors. However, the question of how population behaviour patterns might most effectively be shifted remains one of the greatest uncertainties for research and policy. This reflects the substantial challenges of evaluating population preventive strategies, interpreting the findings and translating them into action. Greater effort and investment in this area may help ensure that research more effectively supports public health action.

What this study adds

  • We discuss the growing importance of evaluating natural experiments and their distinctive contribution to the evidence for public health policy. We contrast the established evidence-based practice pathway, in which randomised controlled trials generate ‘definitive’ evidence for particular interventions, with a practice-based evidence pathway in which evaluation can help adjust the compass bearing of existing policy. We propose that intervention studies should focus on reducing critical uncertainties, that non-randomised study designs should be embraced rather than tolerated and that a more nuanced approach to appraising the utility of diverse types of evidence is required.


View Abstract


  • Twitter @, @, @, @, @

  • Contributors DO conceived of the original idea and drafted the initial manuscript. JA, AB, EWG, JP, KRS, NJW and MW provided critical feedback and contributed to the final version of the manuscript. DO is the guarantor.

  • Funding DO, JP and NJW are supported by the Medical Research Council (Unit Programme numbers MC_UU_12015/6 and MC_UU_12015/1). The paper was initially developed in the course of a visiting appointment as Thought Leader in Residence at the School of Public Health at the University of Sydney, for which the intellectual environment and financial support provided by the Prevention Research Collaboration is gratefully acknowledged. It was further developed under the auspices of the Centre for Diet and Activity Research (CEDAR), a UKCRC Public Health Research Centre of Excellence at the University of Cambridge, for which funding from the British Heart Foundation, Economic and Social Research Council, Medical Research Council, National Institute for Health Research and the Wellcome Trust, under the auspices of the United Kingdom Clinical Research Collaboration, is gratefully acknowledged; and through the authors’ collaboration in organising a workshop on the evaluation of natural experiments of social and environmental interventions with potential impacts on population risk of diabetes and cardiometabolic disease in Atlanta on 7–8 March 2017, at which much of the content was presented and for which funding from the Centers for Disease Control and Prevention is gratefully acknowledged.

  • Disclaimer The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention or other funders mentioned.

  • Competing interests None declared.

  • Patient consent for publication Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data availability statement There are no data in this work.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.