Article Text
Abstract
Background There has been a recent increase in interest in alternatives to randomisation in the evaluation of public health interventions. We aim to describe specific scenarios in which randomised trials may not be possible and describe, exemplify and assess alternative strategies.
Methods Non-systematic exploratory review.
Results In many scenarios barriers are surmountable so that randomised trials (including stepped-wedge and crossover trials) are possible. It is possible to rank alternative designs but context will also determine which choices are preferable. Evidence from non-randomised designs is more convincing when confounders are well-understood, measured and controlled; there is evidence for causal pathways linking intervention and outcomes and/or against other pathways explaining outcomes; and effect sizes are large.
Conclusion Non-randomised trials might provide adequate evidence to inform decisions when interventions are demonstrably feasible and acceptable, and where evidence suggests there is little potential for harm, but caution that such designs may not provide adequate evidence when intervention feasibility or acceptability is doubtful, and where existing evidence suggests benefits may be marginal and/or harms possible.
- Evaluation me
- public health policy
- randomised trials
Statistics from Altmetric.com
Evaluating the effects of public health interventions: barriers to randomised trials and alternative options
Randomised controlled trials (RCTs) are widely regarded as the ‘gold standard’ for estimating the causal effects of public health interventions on pre-defined outcomes in a defined population.1 2 However, there has been increasing interest in non-randomised evaluations of public health interventions where RCTs are considered unfeasible3–6 and guidelines for reporting these.7 8 It has been argued that for certain decisions it is reasonable to infer intervention effects from non-randomised studies and that dismissing designs other than the RCT might lead to a marginalisation of intervention types not amenable to RCTs.3–5 We organised a multi-disciplinary symposium in London in 2006 to review these issues and identify practical solutions. Our two papers summarise the arguments presented, drawing on examples from high- and low-income countries.
It is important to determine which categories of intervention are not amenable to RCTs in order that decisions to use evidence from non-randomised designs are made based not on a wholesale or haphazard ‘lowering of the bar’ regarding standards of evidence, but from a considered assessment as to what interventions require this. It is also important to assess in detail the various alternatives to RCTs in order to determine their relative merits in different scenarios. Existing reviews have not aimed to describe comprehensively the diversity of different alternative designs available, but instead have aimed to discuss overall levels of evidence3 4 or focused on particular alternative designs.6 We conclude by considering for what sorts of decisions non-randomised evaluations might provide adequate evidence and, conversely, when decisions require RCTs. This paper focuses on design while a companion paper considers analysis.
Key features of RCTs of public health interventions
RCTs are experiments that compare outcomes measured in prospective follow-up between those randomly allocated to receive an intervention and the control group randomly allocated to receive the comparison condition (generally currently accepted standard care). RCTs of public health interventions often allocate clusters of individuals (eg, villages or schools) to intervention or control. The strength of RCTs comes in their capacity to ensure a ‘fair comparison’ between intervention and control groups, which are ‘balanced’ and could be expected to experience comparable outcomes in the absence of intervention.
This paper will consider barriers and alternatives to three key features of RCTs: random allocation, control groups and prospective follow-up, only the first of which is unique to RCTs. RCTs also share with other evaluative designs additional features not discussed here. First, many RCTs hide treatment allocation from participants, providers and/or researchers to ensure measurement errors are non-differential by allocation. Although ‘blinding’ of providers and participants is sometimes impossible with public health interventions,9 this is not discussed further because in such scenarios all that can be done is to ensure as far as possible that standardised measurement procedures are used in all groups, and that blinding is maintained—for example, among those managing and analysing data. Second, RCTs aim to include sufficient individuals or clusters of individuals. This should provide the statistical power to maximise the chances of detecting true effects and the statistical significance to minimise the chances of apparent effects arising by chance. Adequate sample size is not discussed further because there are no obvious alternatives.
Barriers to key features of RCTs
Policy makers or providers may believe in the value of an intervention for certain individuals or groups, often regardless of its actual evidence base, and, therefore, oppose random allocation because it prevents their judging allocation. Similarly, clients may have preferences and oppose randomisation. This may arise particularly with interventions associated with ideological beliefs, such as some forms of psychotherapy and community development,10 11 and might explain the non-randomised evaluation of the ‘Healthy School and Drugs’ project12 (Box 1). Where there is demonstrable uncertainty about intervention benefits that is ‘equipoise’ regarding intervention effects, it may be possible to persuade providers that evaluation is required and, while they should decide who are suitable candidates, who within this pool receives the intervention will be determined randomly.13 Client choice has been accommodated in ‘preference trials’, which enable clients to opt for random allocation or choose their preferred intervention, although it is debated which groups should then be compared in analysis.14
Examples of non-randomised evaluations
Intervention—(i) classroom drugs education, (ii) committee involving parents and teachers to coordinate drugs policy, (iii) new rules on substance use and (iv) support for pupils using drugs.
Barriers to RCT design—unclear, but perhaps potential participants were unwilling to undergo random allocation.
Evaluation design—non-randomised prospective concurrent control study with nine intervention and three control schools. Analysis adjusted for baseline differences in pupils' demographic factors (not reported which) and baseline substance-use knowledge and behaviour, but no account taken of clustering in analysis.
Outcomes—self-reported substance-use knowledge and behaviour
Results—some apparent effects, particularly on alcohol use. There may have been potential for unmeasured confounding; for example, from inter-school differences in academic achievement, attitude to school, institutional management etc.
Healthy School and Drugs project12
Intervention—guidelines, training and improved systems for IMCI 1997–2002.
Barriers to RCT design—decisions to implement IMCI made before study; only some clusters had surveillance systems.
Evaluation design—non-randomised comparison of data from routine mortality surveillance, household surveys and health facility surveys in two intervention and two control districts (total population ∼1.2 million) with similar baseline child mortality rates; with process measures and checks for other potential influences (such as bed-net provision).
Outcome—health and survival of children aged less than 5 y.
Results—mortality rates lower in intervention clusters, but too few clusters to exclude chance as an explanation.
Integrated Management of Childhood Illness (IMCI), Tanzania16
Intervention—national introduction of routine childhood immunisation with pneumococcal conjugate vaccine.
Barriers to RCT design—nationwide introduction of intervention for which vaccine efficacy for the prevention of invasive pneumococcal disease previously established.
Evaluation design—time-series analysis comparing post-introduction trends with expected trends (based on admissions prior to vaccine introduction). Trends in dehydration admissions also examined to explore alternative hypothesis that any apparent effects were merely the result of changes in the sampling of data on hospital admissions or of changes in healthcare coverage.
Outcome—monthly admissions for all-cause and pneumococcal pneumonia in the general population (using routine data from Nationwide Inpatient sample).
Results—relative decline in hospital admissions for relevant outcomes compared to predicted trends. No change in admissions for dehydration compared to predicted trends.
Childhood immunisation with pneumococcal conjugate vaccine, USA19
Intervention—radio soap opera broadcast on the national radio, designed to promote the concept of a ‘well planned family’ and increase demand for family planning services.
Barriers to RCT design—national introduction of intervention.
Evaluation design—nationally representative, cross-sectional survey of ever-married women.
Exposure—woman recalls listening to the soap opera in the 6 months prior to interview.
Outcome—woman currently using a modern contraceptive.
Results—differed according to form of analysis (discussed further in companion paper).
Mass-media family-planning intervention in Nepal20
Random allocation may also be opposed because providers or policy makers are so keen to demonstrate the effectiveness of an intervention that they want it implemented in the most promising contexts while control units are not chosen for these reasons, introducing bias. Again where equipoise exists, evaluators might persuade intervention advocates that trial evidence should aid future advocacy.15
Randomisation might also be blocked where policy makers decide pilots must occur with the most needy areas or individuals.2 In such scenarios, where providers assign interventions on the basis of a need score, intervention effect estimates might be derived from a ‘regression-discontinuity’ analysis that explores the shape of the association between the measure of need and the outcome of interest after introduction of the intervention. However, this approach makes several assumptions—notably that the shape of the association between the measures of need and outcome is known.6 Finally, random allocation will also be impossible when decisions have already been made about where/to whom an intervention will be delivered—for example in the evaluation of the Integrated Management of Childhood Illness programme in Tanzania (see Box 1).16
In some cases, it may be impossible to have a control group, randomised or otherwise. This might arise where policy makers or practitioners believe an intervention is beneficial and no one in need should be denied it. Again, where equipoise genuinely exists, it should be ethical to undertake an RCT and persuade opponents of the value of this. However, grey areas exist: an intervention might be shown to be effective in one setting but uncertainty remains as to whether effects will translate to a new setting. This will depend on the complexity of the intervention and of the causal pathway from intervention to outcomes, dissimilarities in infrastructure and client characteristics3 17 and, for infectious diseases, population differences in transmission dynamics.
Advocates of an intervention may find stepped-wedge or crossover trials more acceptable. Stepped-wedge RCTs stagger the introduction of an intervention, randomising the order of receipt.18 In crossover RCTs, all participants receive the intervention for a period and the control condition for a period; randomisation determines the order. The latter is only useful in evaluating acute effects and in scenarios where it is acceptable to withdraw interventions after a period of delivery.
Another scenario is where intervention effects on primary outcomes are known but effects on other secondary but important outcomes are not. This rendered unethical any control-group study of routine childhood pneumococcal conjugate immunisation on pneumonia in the general population (Box 1) because effects on pneumonia incidence, but not on the population burden of disease, were known.19 Where a social intervention's effects, for example on income13 or legal rights,11 are known but health effects are not, true equipoise may not exist, rendering control groups unethical.13 Judgements here depend on the importance of the known benefits.11
Control groups will also generally be impossible where an intervention is already delivered as standard across an entire area,9 such as with the Nepalese family planning intervention20 (Box 1), since policy makers will usually be unwilling to withdraw it even where equipoise remains.11 Where an intervention has yet to be delivered, control groups will also be impossible where it is legally, bureaucratically or practically necessary for delivery to be consistent across an entire state or nation—for example, with laws and regulations, welfare benefits or mass media.13 The possibility of ‘contamination’ is also sometimes cited as a reason not to rely on control groups21 although the effects of this can be reduced by employing a cluster design.22
Longitudinal follow-up of participants from pre-intervention baseline measures to post-intervention outcomes may be impossible when an evaluation begins only after an intervention has been delivered to a population or where policy makers or evaluation funders are reluctant to have lengthy periods of observation pre-implementation. Longitudinal follow-up may also be difficult when there are long gaps between intervention and manifestation of key outcomes, as is often the case with prevention23 or where outcomes are rare.21 Politicians may be uninterested in studies lasting longer than their probable period in office.
A final barrier to RCTs of public health interventions is lack of funding, particularly where trials aim to detect relatively small effects and require large samples. Sometimes it can be argued that only an RCT can adequately address a critical evidence gap. Our discussion section considers in what scenarios this is so. Funders may sometimes be prepared to fund smaller, cheaper ‘non-inferiority’ RCTs aiming to detect whether intervention benefits are equivalent to/better than current practice, although how best to analyse these is debated.24
Alternatives to random allocation
When random allocation is not possible it may instead be possible to employ a control group with prospective matching and/or post hoc adjustment for potential confounders (table 1). The latter was used in the evaluation of the Dutch ‘Healthy School and Drugs’ intervention where a number of potential confounders such as attitude to smoking were examined.
The disadvantage of these options is they cannot control for unmeasured or imperfectly measured confounders. Among other threats to internal validity, the ‘Healthy School and Drugs’ project did not adjust for pupils' attitudes to school despite evidence that this might be a confounder.25 Inadequate reporting of how potential confounders are identified has previously been identified as a deficit in many epidemiological studies7 and applies equally to evaluations. Comprehensive matching/adjustment on all potentially important confounders is likely to be difficult when evaluations rely on routine data from intervention and/or control groups and when it is necessary to make adjustment for cluster-level variables but only a small number of clusters have been enrolled, as was the case with the Integrated Management of Childhood Illness evaluation. Confounding can lead to underestimates of effects26 (for example, when intervention recipients' greater needs are not sufficiently considered) or, probably more commonly,27–29 overestimates (for example, where intervention recipients' lesser needs or greater uptake of the intervention are insufficiently considered). For example, non-random studies of the association between vitamin-A deficiency and mother-to-child HIV-transmission reported associations, but RCTs of vitamin-A supplementation found no evidence of effect.30 Our second paper considers other recently proposed analytic strategies to minimise confounding.
Alternatives to control groups
When it is not possible to recruit a prospective control group, it may still be possible to compare outcomes in a study population with rates in the general population. Where rates change in the intervention population but not the general population, this provides some evidence for intervention effects although confounding and regression to the mean (see below) may introduce bias. This approach has been used in studies evaluating the effects of new roads on mortality and injuries.31
In the absence of external comparison, ‘before–after studies’ may be possible. Such studies are vulnerable to confounding from secular and maturational trends as well as contemporaneous influential events. The extent to which these undermine an evaluation depends on context: secular trends are less problematic where rates of an outcome among a population are stable and where intervention effects are large and specific, such as was the case for the evaluation of the introduction of administrative restrictions on the use of certain antibiotics in Canada.32 Evidence from before-after studies can be persuasive when assessing new behaviours, such as the use of new weaning food to promote child growth. However, improvements in growth or reductions in disease can only be attributed to the programme with confidence if data suggest other influences are unlikely. Problems may also arise from the incomparability of data collected before and after an intervention; for example, where the former relies on routine data while the latter involves evaluation surveys.
Estimating outcome rates in repeat cross-sectional surveys before and after an intervention (interrupted time-series study) may allow consideration of whether secular trends underlie observed changes. However, this may be expensive and also hampered by selection bias introduced by unmeasured changes in the composition of the sample over time. Such studies also cannot account for non-linear trends unless many pre-test measurements are taken, and may be insensitive to gradual changes such as might arise from anti-tobacco campaigns.22
None of the above strategies will address potential confounding from contemporaneous influential events (eg, a TV show with an HIV storyline). To strengthen their evidence, the study of pneumococcal conjugate vaccine analysed observed admission rates for a control condition (dehydration), which were very similar to the rates expected based on pre-intervention trends. This provided circumstantial evidence that the decline in pneumonia was not due to confounding from changes in healthcare coverage.19 Alternatively, time-changes in the same outcome in a location or age group not subject to the intervention but subject to other factors that may cause changes over time could be examined. The pneumococcal conjugate vaccine study also considered changes in age-groups other than the infants who were the target of the intervention, showing reduced all-cause pneumonia admissions among adults aged 18–39 y (tentatively interpreted as evidence of a vaccine herd-effect among this group since they would include the parents of children directly exposed), but no evidence of declines among older adults. As with control of confounding, such strategies require knowledge of what other factors might influence outcomes. Process evaluations may also be useful in examining the plausibility of such influences.
A further problem, ‘regression to the mean’, can occur when participants are selected at baseline for their increased risk (eg, an HIV counselling intervention targets individuals concerned about their own risk), which then returns to a less-extreme level regardless of intervention.33 Whether regression to the mean underlies apparent outcomes can be assessed by using a baseline measure that is the mean of several pre-intervention measures or, if this is unavailable, assessing whether intervention effects are apparent among participants at different levels of baseline risk.
Alternatives to prospective follow-up
It may be possible to use intermediate outcomes which predict final outcomes, but such proxies may underestimate or, more often, overestimate longer-term effects (because of the greater likelihood of dilution34 rather than maintenance or multiplication35 of effects). Another alternative is to use retrospective measures of exposure—for example, in case-control studies of vaccines.36 Evaluations such as that of the Nepalese radio campaign employ a cross-sectional survey to compare outcomes between those reporting/not reporting exposure to the intervention. Because such studies rely on retrospective information they generally provide weak evidence about the temporal sequence of intervention and apparent outcomes. A further problem is that women reporting exposure to the intervention are likely to be a sub-set of those actually exposed and may differ in their response. Such studies are also vulnerable to confounding by unmeasured differences between those exposed/not exposed, a matter discussed further in our companion paper.
Other strategies
An additional way to assess whether outcomes arise from an intervention or other influences is to use process evaluation to determine whether there are plausible pathways linking intervention and outcome(s). While often used within RCTs,37 these are even more important as a means of triangulation within non-randomised studies. For example, a non-randomised study of a teenage pregnancy prevention intervention drew on quantitative and qualitative data on young people's participation and negotiation skills to explore the plausibility of pathways to sexual health outcomes, as well as undertaking sensitivity analyses determining whether effects varied by exposure.38 Examining the plausibility of causal pathways is facilitated by interventions being explicitly theorised.39
Discussion
Based on discussion at a symposium involving individuals with experience of evaluating public health interventions, this paper has aimed to go beyond debates about the levels of evidence provided by RCT versus non-RCT studies in order to describe and exemplify practical solutions to problems encountered when choosing between designs. In summary, where genuine equipoise regarding health and other important outcomes exists, practical barriers are often surmountable so that RCTs (including stepped-wedge and cross-over trials) are possible. Where equipoise does not exist, or where delivery has already occurred or of necessity must be nationally consistent, RCTs and other prospective comparison designs may be unethical and/or unfeasible. In such cases, decisions should draw on other evidence as suggested for example in GRADE guidance.5
As we have seen, non-randomised studies can adopt strategies to reduce the possibility that other factors explain apparent intervention effects. Concurrent control groups are useful in minimising time-related confounding that hampers before-and-after studies and will be more convincing when evaluators take a comprehensive approach to identifying potential confounders. Process evaluations can check causal pathways and/or examine whether other factors might be influential. Checklists are useful for assessing the quality of non-randomised studies,7 8 but case-by-case assessment of context-specific threats to validity and how to address these is also required.
Whether non-randomised studies actually provide useful evidence to guide decisions also depends on the effect sizes estimated and the decision/intervention being considered. Evidence of an intervention's impact will be more convincing when reported effect sizes are large since it is less likely that confounding or other sources of error can completely explain large effects4 6 (although large effects arising from bias are not without precedent40). Evidence from non-randomised studies will also be more persuasive when results are consistent across studies5 9 (although such studies can sometimes consistently mislead, exemplified by non-random studies of vitamin-A deficiency and mother-to-child HIV transmission mentioned above30).
When an intervention is costly, difficult to deliver or unacceptable to some stakeholders, when existing research suggests benefits may be small, or when there is evidence or scope for harmful effects, there is a strong argument that only an RCT will provide adequate evidence and barriers to undertaking one must be surmounted. However, where there exists evidence that an intervention is cheap and relatively easy to deliver, acceptable and there is minimal potential for harm, there is a stronger case for accepting evidence from other designs.13 41 One relevant scenario here is confirmatory studies of the outcomes of intervention translated to new settings with few changes where previous RCTs report benefits and wider evidence suggests little scope for harm.5
Finally, further research is required on the effect of specific methods outlined above on the size and direction of bias in different areas of public health, as well as the use of STROBE and other guidelines to improve the reporting of non-randomised evaluations.
What is already known on this subject
Randomised trials are widely regarded as the ‘gold standard’ for estimating the causal effects of public health interventions.
It has been argued that for certain decisions it is reasonable to infer intervention effects from non-randomised studies, and that dismissing designs other than the randomised trial might lead to a marginalisation of certain intervention types.
However, existing reviews have not aimed to assess comprehensively the barriers and alternatives to randomised trials of public health interventions.
What this paper adds
An RCT will generally not be possible where there is not equipoise regarding an intervention's health or other important effects, or where it must for practical, legal or bureaucratic reasons be delivered consistently across a nation.
Evidence from non-randomised designs is more convincing when confounders are well-understood, measured and controlled, there is evidence for causal pathways linking intervention and outcomes and/or against other pathways explaining outcomes, and effect sizes are large.
Acknowledgments
We would like to thank Diana Elbourne and Ben Armstrong for their contributions to the development of this paper. We would also like to thank those who attended a symposium on evaluating public health interventions convened by the London School of Hygiene and Tropical Medicine on 6 November 2006 for contributing insights and thus informing the development of this paper.
References
Linked Articles
- Feature section: interventions
- Feature section: interventions