In the 1970s Archie Cochrane noted that many healthcare procedures and forms of organisation lacked evidence of effectiveness and efficiency, and argued for improved methods of evaluation, moving from clinical opinion and observation to randomised controlled trials (RCTs). His arguments gradually became accepted in medicine, but there has been considerable resistance among policymakers and researchers to their application to social and public health interventions. This essay argues that opposition to RCTs in public health is often based on a false distinction between healthcare and community settings, and sometimes on a misunderstanding of the principles of RCTs in health care. It suggests that just as in medicine, good intentions and received wisdom are not a sufficient basis for making public policy and allocating public funds for social or health improvement.
- Archie Cochrane
- comm interven trials
- evid based med SI
- lack of evidence
- objections to trials
- randomised controlled trials
- social and public health interventions
- social inequalities
Statistics from Altmetric.com
- Archie Cochrane
- comm interven trials
- evid based med SI
- lack of evidence
- objections to trials
- randomised controlled trials
- social and public health interventions
- social inequalities
Archie Cochrane's book, ‘Effectiveness and efficiency: random reflections on health services’1 is well known for arguing the case for randomised controlled trials (RCTs) in health care. Antenatal care was one of the fields he singled out:
‘this service is basically a multiphasic screening procedure, which by some curious chance, has escaped the critical assessment to which most screening procedures have been subjected in the last few years and there seems no reason why the same approach that has proved so useful elsewhere should not be used here’. (Cochrane, p 66)1
I was particularly interested in this comment because at the time I was involved in evaluating a modified schedule of antenatal care, using a before and after case study approach.2 Our problem was that there was no clear counterfactual with which to compare the new system. The study, which cost a not insignificant amount, was therefore somewhat inconclusive, other than finding that the new system did not seem to be a complete disaster or kill mothers and babies. Opponents could use our findings to suggest the new system was worse, and supporters that it was better, than the old one. We noted that:
‘Random allocation to different schedules would have maximised comparability between experimental and control groups, and simultaneous comparisons would have avoided the contaminating effects of secular change on the outcome measures used.’ (Hall et al, p 115)2
The arguments against using an RCT were that it was unethical to experiment on pregnant women and their children, and that antenatal care was a complex matter and therefore inherently unsuitable for an RCT. Interestingly, it was considered ethically permissible to experiment in a non-randomised way, in the process withholding what was currently considered normal care from pregnant women, and scientifically permissible to introduce and try to evaluate a complex system of care without any controls.
Subsequently, Cochrane's arguments were highly influential among perinatal epidemiologists, and a number of RCTs of maternity care in the UK were undertaken, including:
Midwife or general practitioner-led care versus obstetrician-led care3–5
Traditional versus reduced schedules of antenatal visits6
Women holding or not holding their own obstetric records7
Perineal management (restrictive vs liberal use of episiotomy)8
Postnatal support for mothers in disadvantaged inner-city areas.11
Many of these trials were initially thought to be impossible, unethical and/or impractical, but they nevertheless happened. They had a number of important features. First, they did not insist on slavish adherence to a standardised protocol. The trials of midwife versus shared care did not involve women being banned from seeing obstetricians or midwives, but rather the comparison of two general policies or principles.4 Second, they involved multiple components and series of decisions, social interactions and behaviours. Third, many evaluated activities spanning many months. Fourth, they involved multiple outcomes as well as multiple inputs (eg, the antenatal care trials typically looked at antenatal admissions, non-attendances, numbers of antenatal visits, antenatal diagnoses, inductions of labour, satisfaction with care, etc). Fifth, they involved evaluations of processes and acceptability as well as outcomes. Sixth, randomisation meant that results were less likely to be biased by self-selection of patients or professionals, and were therefore more conclusive than previous case study approaches. However these (fairly typical) characteristics of healthcare RCTs often seem to be misunderstood by opponents of public health RCTs.
Public health research
‘Although there is often evidence on the scientific justification for action and for some specific interventions, there is generally little evidence about the cost-effectiveness of public health and preventative policies or their practical implementation.’ (Wanless, p 5)14
Why so little evidence? In the UK, many evaluations focus on inputs, throughputs and customer or professional satisfaction rather than on outcomes. Second, few policies or programmes are implemented in ways that facilitate robust evaluation of outcomes (eg, they often lack baseline data, comparison groups, clear objectives, and/or statistical power). Third, there is a general reluctance in the UK to subject social or public health policies to RCTs. There have been far fewer controlled studies in the UK than in the USA, which raises issues of generalisability across contexts (eg, the ‘nurse family partnership’ developed and extensively studied in the USA15 may not be relevant in the UK where deprived first-time mothers already have access to antenatal and postnatal support via the NHS). These issues together militate against the production of robust evidence about effectiveness and efficiency.
Recently in the UK, members of parliament criticised policymakers' approaches to evaluation:
‘All too often Governments rush in with insufficient thought, do not collect adequate data at the beginning about the health of the population which will be affected by the policies, do not have clear objectives, make numerous changes to the policy and its objectives and do not maintain the policy long enough to know whether it has worked.’ (House of Commons Health Committee, p 5)16
They were particularly critical of officials' reactions to suggestions that controlled trials should be used. Many of these responses suggested fundamental misunderstandings about the principles and practice of RCTs. For example, one senior civil servant rejected suggestions that a ‘healthy towns’ initiative be subjected to a controlled trial by saying:
‘it would challenge any academic to come up with a randomised town’ (House of Commons Health Committee, p 7)17
implying that randomisation means choosing one town at random and treating it as representative of all towns. In the next section I illustrate and discuss such common misconceptions, in particular that community trials are essentially different from healthcare trials.
Some objections to RCTs in public health
Communities differ whereas individuals do not
‘A community intervention with a matched community control is far more feasible (than an RCT) but still challenging because, unlike individuals, communities vary widely in characteristics related to exposure to risk.’(Moller, pp 2–3)18
‘It is unlikely that any complex intervention will work for everyone.’19
However, individuals also vary enormously in exposure to risk and response to interventions, which is why one needs sufficient sample sizes to capture variations in both experimental and comparison groups.
Communities and organisations are complex whereas individuals are not
‘Communities clearly differ. They also have attributes that are not reducible to those of individual members. These include cultures (eg, religious beliefs), structures (eg, employment patterns, and relationships (eg, contact between ethnic groups)).’ (Pawson, p 52)20
However, individuals are also complex organisms, with characteristics that are greater than the sum of their component chemical parts. Also, all interventions in healthcare settings, even if of a highly standardised drug, involve social settings and social interactions, power dynamics, local cultures, motivations, behaviours, etc. This is certainly the case of perinatal trials mentioned above; the idea that a trial of midwife-led versus obstetrician-led antenatal care, involving interactions between pregnant women and healthcare professionals over several months, is not complex seems bizarre. Indeed as Oakley has suggested:
‘It can be argued that the greater the complexity of the setting into which an intervention is introduced, the more need there is to ensure that factors that may affect the outcomes of interest are equally distributed between intervention and control groups.’(Oakley et al, p 175)21
Social/public health interventions, unlike surgical or drug interventions, do not do harm
One reason for objections to RCTs of public health policies or programmes is the belief that, unlike surgical or pharmaceutical interventions, they are unlikely to do harm. This view privileges social and public health actions, and assumes that the plausibility of potential benefit is a sufficient basis for action.22
However, there are numerous examples of apparently plausible policies or programmes having no benefit or actually being harmful. For example, in the UK the risk of death from fire is associated with low socioeconomic status because of social differences in risk factors for fires and the ownership of smoke alarms, and the risk of death in a house fire is three times higher in homes without smoke alarms; so it is plausible that giving free fire alarms to deprived households might reduce excess fire deaths among them. However, an RCT found that giving out free alarms in a deprived community did not reduce injuries from fire, because few alarms had been installed or maintained. It concluded that issuing free smoke alarms may waste resources and be of little benefit unless alarm installation and maintenance is assured.23
The Scared Straight programme in the USA brings juvenile delinquents into prisons to meet life prisoners, who attempt to deter them from a life of crime. Criminologists and many stakeholders, including the general public, have been positive about the programme, which has prima facie plausibility. However, none of seven RCTs showed any benefit, and a meta-analysis showed that recidivism rates were higher among the experimental group.24 If there had not been RCTs of fire alarms and Scared Straight, we might continue to implement these programmes on the basis of their plausibility.
Community trials are impossible
It is often simply stated as a matter of agreed fact that RCTs are impossible in community settings. However, a wide range of community-based RCTs has already been undertaken or planned, including:
The Mexican universal health insurance programme25
The effects of hand washing on child health in squatter settlements in Karachi26
The deterrent effect of police raids on crack houses27
The effectiveness of toughened glassware in reducing injuries in bars28
School breakfast clubs29
Out-of-home day care for disadvantaged families30
Community-level interventions to address social and structural determinants of health in 40 areas in London.31
Public health trials are more difficult than healthcare trials
Some opponents of public health RCTs seem to assume that RCTs in health care are easy to set up and do not involve problematical ethical issues (in contrast to what is required for community-based trials). However, similar difficulties have been overcome in healthcare RCTs; for example, for temporal lobe epilepsy,32 arthroscopic surgery for osteoarthritis of the knee,33 and transplantation of embryonic dopamine neurons for severe Parkinson's disease34 (the latter two involving sham surgery in the control group). The triallists surmounted many of the difficulties often regarded as too difficult in community evaluations.
RCTs require one to adhere strictly to protocol
One objection to public health RCTs is that they force one to stick rigidly to protocol:
‘A population outcome is the goal. Such a broad focus means that strict control of intervention, subject and analysis required for a true experiment or clinical trial is impossible. This send shudders down the spines of those brought up in the empirical tradition.’ (Moller, p 2)18
However, healthcare RCTs do not require one to have such strict standardisation. The trial of perineal management did not insist that women in the experimental arm all had episiotomies and nobody in the control arm did,8 and the RCT of surgical versus medical treatment for epilepsy did not closely control what the surgical or medical treatments were.32 Intention-to-treat analysis of trial outcomes is recommended precisely because not everyone in the intervention group will actually receive the intervention or receive it in the same way, and not everyone in the control group will be deprived of the intervention.35
The term ‘controlled’ may lead to some confusion here, it sometimes being interpreted as meaning rigid fidelity to the programme, rather than some sort of comparison with a counterfactual such as a comparison group, which is what would happen if the intervention had not taken place.
RCTs involve a single experimental and comparison unit
‘Even where matching populations have been found, a final comparison comes down to a single case with control design and a critical reviewer can easily dismiss results.’ (Moller, p 3)18
This seems to assume that public health RCTs would only select one intervention and one comparison group; (see the earlier comment about ‘a random town’)17 however, as has been pointed out:
‘It is common to see reports of community intervention trials in which one intervention community is compared with one control committee. This is equivalent to a clinical trial with one patient in each treatment group.’ (Hayes and Bennett, p 323)36
A key element of healthcare RCTs is that they involve sufficient numbers in both experimental and control groups to rule out the role of chance, and in many cases in public health it would similarly be possible to have sufficient numbers of experimental and control units (eg, schools, neighbourhoods or towns).36
RCTs are expensive, and if they do not demonstrate a positive difference are failures
‘There are examples of failed, expensive trials. For example no intervention effect was observed among heavy smokers, the primary target population of the COMMIT trial. Similarly CART demonstrated limited positive results, with most cancer related behaviour showing no intervention effects…’ (Sanson-Fisher, p 158)37
This seems to conflate the notion of a failed trial (ie, one that is badly designed or fails to recruit sufficient numbers) with a failed policy or intervention (ie, one that shows modest or no benefit or actual harm). Both the fire alarm23 and the Scared Straight trials24 illustrate that RCTs can be successful and cost effective even if they show reasonably conclusively that an intervention has no, modest, or adverse effects.
RCTs do not have long enough follow-up
‘In population health research many outcomes of interest are far into the future … the practical difficulties in maintaining prospective randomisation for prolonged periods across entire populations are substantial’ (Sanson-Fisher et al, p 157)37
However, there is no intrinsic reason for the issue of length of follow-up to be any different between an RCT and an uncontrolled evaluation. (The High Scope Perry RCT of early childhood intervention has now followed up the participants for 40 years).38 Such comments may be confusing the long-term follow-up of outcomes with the long-term maintenance of the exposure to which people or communities were randomly assigned.
RCTs have poor external validity
‘a disadvantage to using RCTs in population health research is the lack of generalisability, or low external validity’ (Schweinhart et al, p 157)38
It is difficult to understand why non-RCTs in community settings, for example the numerous (and sometimes very expensive) non-randomised evaluations in the UK of area-based initiatives such as Sure Start39 or Health Action Zones40 should be regarded as any more generaliseable than RCTs.
RCTs are unethical
It is often seen as unethical to conduct policy RCTs because they withhold potential benefits from the control group.22 However, this assumes that well-intentioned policies will be beneficial, and that the direct or opportunity costs of implementing a policy are of no ethical concern. As was pointed out in the early days of healthcare RCTs, it seems perverse to see it as ethical to give or withhold programmes of unknown benefit to 100% of the population, but not to 50%.41 42 This is particularly the case when the intervention has to be rationed anyway (eg, not every community can have a Sure Start local programme or be a Health Action Zone).
As members of parliament have pointed out:
‘All the reforms we have discussed are experiments on the public and can be as damaging (in terms of unintended effects and opportunity cost) as unevaluated new drugs or surgical procedures. Such wanton large-scale experimentation is unethical, and needs to be superseded by more rigid rigorous culture of piloting, evaluating and using the results to inform policy.’ (House of Commons Health Committee, p 66)16
It appears that many objections to social and public health RCTs are based on false comparisons with healthcare RCTs, the latter being seen to be simple, and to have standardised exposures and outcomes. Many of these objections were raised earlier about healthcare RCTs, but have been overcome in the healthcare arena, often with considerable ingenuity, and with recognition of the complexity of humans and their contexts.
Do objections to social and public health RCTs matter? I believe they do, and that Cochrane would agree. For example, the government refused to allow a randomised evaluation of the Sure Start programme in England, which has led to considerable problems in interpreting the results, because although the researchers tried to find matching areas not receiving the intervention, and to control for any obvious socioeconomic and demographic differences between intervention and control areas, differences between areas receiving and not receiving the intervention, rather than the intervention itself, may influence any observed differences in outcome.16 43 The results of an expensive evaluation of an expensive intervention can therefore be contestable rather than conclusive.
With rare exceptions, such as when a new universal policy such as banning smokeless fuel or indoor smoking is introduced and can be evaluated using interrupted time-series methods,44 RCTs are both more possible than many objectors think, and more conclusive about the benefits and cost effectiveness of (usually publicly funded) policies and interventions.
So I think we should be braver and much more creative in arguing (whether with politicians, public health practitioners, research funders, potential recipients or ethics committees) for RCTs. As for the privileging of social and public health as not requiring robust randomised evidence, I believe that Cochrane's comments about psychiatry are equally applicable to public health:
‘I cannot agree that colleagues, however distinguished, intelligent and hard-working, and who obviously believe they are doing good, should have a blank cheque to encourage the use of (XX) without bothering to measure the benefit and cost of what they are doing.’ (Cochrane, p 59)1
This is an abridged version of the Cochrane lecture given at the Society for Social Medicine annual conference in 2009. The author is grateful to Lyndal Bond, Iain Chalmers, Matt Egan, Mark Petticrew and Helen Roberts for their contributions to her thinking on this topic, in some cases over many years, and to the Society for Social Medicine committee for inviting her to give this lecture.
Funding This work is supported by the UK Medical Research Council, wbs U.1300.00.006.
Competing interests None declared.
Provenance and peer review Not commissioned; not externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.