Statistics from Altmetric.com
The commitment in recent years to ensuring that rigorous evidence is available to guide medical practice and health policy making is commendable. To guide the assessment of evidence, various approaches have emerged in recent years. The Grading of Recommendations Assessment, Development and Evaluation (GRADE) framework has enjoyed particular popularity, providing a systematic and intelligible approach to ranking available research outcomes.1
The merits and limitations of the GRADE framework for systematically evaluating the quality of evidence for guiding clinical practice guidelines have recently been eloquently debated.2 3 We will not dwell on the methodological allegations that GRADE suffers from external and internal inconsistency, potential for bias and lack of validation, nor the possibility that these apparent flaws are a result of maladroit operators rather than framework deficiencies. Our concern is that the GRADE framework may have some unforeseen detrimental public health impacts unless modified.
The large-scale vigorous adoption of the framework across the global public health sector clearly demonstrates a laudable desire to unlock the previously impenetrable black box of policy formulation that resided in the hands of a limited number of “experts” and bureaucrats. In this regard, having the piercing spotlight of a framework, in which the evidence underpinning decisions is openly presented and transparently evaluated for robustness, should be broadly welcomed.
Our concern relates to the adequacy of the traditional hierarchy of research design used to categorise the “strength of evidence” when applied to preventive public health programmes.4 Although the hierarchy is well suited to the relatively narrow domain of therapeutic effectiveness, it performs less satisfactorily when broader evidence streams at population level need to be synthesised to inform decisions on public health programme strategies. This is particularly pertinent to environmental modification strategies and immunisation programmes, where the archetypal double-blinded randomised control trial (RCT) may not be technically or ethically feasible nor provide the true measure of population impact or public health benefit. In certain situations, the evidence from observational study designs or, heaven forbid, ecological analyses or opportunistic outbreak investigations, may provide a more adequate measure of a public health strategy's impact.
Immunisation is a particular case in point. High immunisation coverage against a specific pathogen often provides indirect benefits beyond those that can be ascertained through traditional RCTs, particularly population herd immunity and a reduced effective reproduction number of the targeted pathogen. The indirect effects on the cocirculation of other pathogens can also typically be ascertained with any certainty only through the use of observational epidemiological methods. However, such evidence is rated of inferior quality through frameworks such as GRADE.
Often, ethics committees make their own assessment of the evidence and appropriately rule as unethical the RCTs required to achieve high ratings on GRADE. This is illustrated in a recent World Health Organization measles position paper, where ethically responsible reliance on a 1968 quasi-RCT, which followed 21 653 children in the UK aged 10–24 months for 2 years and 9 months after vaccination and found a 94% protective effect of live, monovalent vaccine against measles, resulted in a “moderate level of scientific evidence” using GRADE.5
Unfortunately, uninformed comparisons of GRADE scores of health interventions by governments deciding on where to spend their limited health budgets may result in measles vaccination being deprioritised because it did not achieve a “high evidence score”. Similarly, antivaccination lobby groups may abuse such ratings to instil doubt and concern in the community, with tragic resurgences of preventable diseases.
The GRADE system addresses one evidence domain, the classical scientific evidence. To ensure its value in informing public health prevention programmes, additional epidemiological domains should be evaluated, and a set of ratings should be provided to ensure the use of comprehensive public health evidence in informing policy making. We propose that these domains could include adaptations of those originally proposed by Bradford–Hill for assessing causality,6 in particular, the consistency of evidence over time in a variety of geographical locations and as gathered by different researchers, the specificity of the intervention in relation to its observed effects, the coherence of different sources of available evidence and the gradient of effects with scale of population level impact compatible with degree of coverage.
GRADE and similar frameworks provide an explicit description of the quality of data supporting policy decisions. It is essential that such frameworks, which are well suited for advising clinical therapeutic decisions, are not carelessly applied to complex policy making in preventive programmes, where non-RCT evidence may be the only or most appropriate and valid data available. We propose that, when ranking the available evidence for these programmes, a GRADE-plus framework is applied that equally weights the quality of appropriate experimental and observational data.
Competing interests None.
Provenance and peer review Not commissioned; not externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.