Damned if you do, damned if you don't: subgroup analysis and equity
- Mark Petticrew1,
- Peter Tugwell2,
- Elizabeth Kristjansson3,
- Sandy Oliver4,
- Erin Ueffing5,
- Vivian Welch5
- 1Department of Social and Environmental Health Research, Faculty of Public Health and Policy, London School of Hygiene and Tropical Medicine, London, UK
- 2Department of Medicine, Faculty of Medicine, University of Ottawa, Ottawa, Ontario, Canada
- 3School of Psychology, University of Ottawa, Ottawa, Ontario, Canada
- 4Social Sciences Research Unit, Institute of Education, University of London, London, UK
- 5Institute of Population Health, University of Ottawa, Ottawa, Ontario, Canada
- Correspondence to Mark Petticrew, Department of Social and Environmental Health Research, Faculty of Public Health and Policy, London School of Hygiene and Tropical Medicine, 15–17 Tavistock Place, London WC1H 9SH, UK;
Contributors MP wrote the first draft of the paper, and all authors contributed to subsequent redrafts of the paper and contributed examples. MP will act as guarantor.
- Accepted 7 April 2011
- Published Online First 6 June 2011
The final report from the WHO Commission on the social determinants of health recently noted: ‘For policy, however important an ethical imperative, values alone are insufficient. There needs to be evidence on what can be done and what is likely to work in practice to improve health and reduce health inequities.’ This is challenging, because understanding how to reduce health inequities between the poorest and better-off members of society may require a greater use of subgroup analysis to explore the differential effects of public health interventions. However, while this may produce evidence that is more policy relevant, the requisite subgroup analyses are often seen as tantamount to statistical malpractice. This paper considers some of the methodological problems with subgroup analysis, and its applicability to considerations of equity, using both clinical and public health examples. Finally, it suggests how policy needs for information on subgroups can be met while maintaining rigour.
The key, and often-rehearsed, problem is that we need to know more about what works to reduce health inequalities. This evidence on effectiveness is likely to come at least partly from the results of subgroup analyses, which compare the effects of interventions across different population groups; however, intervention studies (both trials and observational studies) do not often present such data. This can be shown by reference to Cochrane reviews. The Cochrane/Campbell Health Equity group (http://equity.cochrane.org/en/index.html) has been using Evans and Brown's framework PROGRESS (place of residence, race/ethnicity, occupation, gender, religion/culture, education, socioeconomic status, social capital/networks) for identifying these subgroups of interest in systematic reviews.1 This has recently been extended to ‘PROGRESS-Plus’ to incorporate several other key descriptors (eg, disability, sexual orientation, age).2 An analysis of data from 11 Cochrane review groups shows that less than 5% of reviews have carried out a subgroup analysis across these PROGRESS factors, whereas nearly 10% carried out a subgroup analysis on one of the ‘Plus’ factors, mostly by analysing effects stratified by age. A further analysis of trials of interventions for rheumatoid arthritis has shown that less than 50% of the systematic reviews reported dimensions of PROGRESS, even when they had been reported in the primary study, suggesting that opportunities to explore differential effects may be lost.3
Current evidence synthesis is therefore replete with lost opportunities to learn more about how to tackle inequities in health. The downside is that, even when such evidence of differential effects exists, it will derive in many cases from post-hoc subgroup analyses, and in epidemiological terms this represents weak and potentially misleading evidence: ‘Sub-group analyses pose problems in interpretation… it is reasonable to carry out a small number of subgroup analyses if these were specified in the protocol (author's emphasis) but on no account should the data be analysed in numerous different ways in the hope of discovering some significant comparison.’4
The Cochrane handbook for systematic reviewers also warns that ‘Subgroup analyses are observational by nature and are not based on randomised comparisons’.5 In this regard, of course, subgroup analyses are like systematic reviews, which are themselves observational studies. The handbook continues: ‘False negative and false positive significance tests increase in likelihood rapidly as more subgroup analyses are performed. If their findings are presented as definitive conclusions there is clearly a risk of patients being denied an effective intervention or treated with an ineffective (or even harmful) intervention. Subgroup analyses can also generate misleading recommendations.’ (Section 9.6.2)
The handbook also describes issues of ‘qualitative interaction’ and ‘quantitative interaction’: ‘Qualitative interaction’ exists if the direction of effect is reversed, that is if an intervention is beneficial in one subgroup but is harmful in another. Qualitative interaction is rare. This may be used as an argument that the most appropriate result of a meta-analysis is the overall effect across all subgroups. Quantitative interaction exists when the size of the effect varies but not the direction, that is if an intervention is beneficial to different degrees in different subgroups'.6
Put baldly, we need to know more about the potential for interventions to reduce health inequity, but what we do know often derives from analyses that are generally viewed as weak at best, and misleading at worst. Researchers are thus criticised by policymakers for not doing enough subgroup analyses, and criticised by statisticians for doing too many: they are damned if they do, and damned if they don't.
‘Subgroups kill people’
Subgroup analyses may lead to both under and overtreatment.7 In one example, Hernandez et al8 show how a statistically non-significant effect of aspirin in the primary and secondary prevention of coronary heart disease in women was observed in randomised controlled trials (RCT), based on small numbers, and an overinterpretation of this subgroup analysis may have led to the undertreatment of women for years, although we now know that aspirin is effective in women. Conversely, Fletcher9 cites several cases in which an overemphasis of effects in subgroups may lead to overestimation of the beneficial effects of interventions. In one example from an RCT of compression therapy with and without venous surgery for the treatment of varicose leg ulcers, the surgeons were interested in whether the degree of reflux in the varicose veins had a bearing on the effects of surgery. There were three subgroups under consideration: those with superficial reflux alone; those with additional segmental deep reflux and those with total deep reflux. The combination of surgery and compression showed reduced recurrence rates for patients in isolated superficial reflux, but marginally significant or non-significant results in the other two classes of deep reflux. However, just because the result was statistically significant in one group and not in the other two did not mean that there was a real difference between the groups. The ulcer recurrence rate was actually similar for the three subgroups, and the non-significant differences observed in the latter two subgroups were more likely to have been due to the smaller sample sizes. Examples such as these provide support for the contention that ‘Subgroups kill people’ (Rothwell,7 quoting statistician Richard Peto).
…and lack of subgroup analysis kills people
However, lack of knowledge of subgroup effects can also be harmful and wasteful of resources. For example, in the hypertension detection and follow-up programme the control and prevalence of hypertension varied with socioeconomic status (SES) (using educational level as an indicator of SES), and with a strong inverse relationship between SES and all-cause mortality; low SES was associated with higher 5-year all-cause mortality in those undergoing usual care in the trial.10 11 However, in the intervention arm of the trial (which received ‘stepped-care’) the SES differences disappeared and there was no significant association between education and mortality, whereas the significant association persisted in the usual care group. In short, analysis of the data stratified by SES showed that stepped care eliminated the inverse relationship between educational level as an indicator of SES and mortality. Lack of subgroup analysis would therefore overlook the significant reduction in mortality due to stepped care in the socioeconomic groups at excess risk.
Subgroup analysis is likely to become more common as reviewers in public health and related areas increasingly turn their attention to the relative effectiveness of interventions (such as social policies) in different socioeconomic groups, in an attempt to identify interventions with the potential to reduce income-related inequities in health.12 Subgroup analysis has also been used to identify the likely active ingredients of complex interventions13 and to investigate methodological rigour when investigating policy interventions for addressing inequity.14 The role of subgroup analyses in identifying the harms of interventions has also been highlighted.15 For example, adverse effects may be identified in some subgroups of patients and not others. This suggests an important role for such analyses in assessing the unintended effects of well-meaning policies, and provides a further rationale for considering how existing methodological guidance may apply to equity-focussed systematic reviews.
Systematic reviewers have argued for some years for the importance of exploring moderator effects in systematic reviews.16 Limiting the number of subgroup analyses to those that are prespecified is one proposed solution to the problem of type 1 error.17 However, there are other complexities. There are, for example, many different types of possible subgroup effects that may be of relevance to a policymaker, such as:
When the overall intervention effect is statistically significant, and when the subgroup effects are in the same direction and are either statistically significant or not (ie, quantitative interaction);
When the overall intervention effect is not statistically significant, but when one or more subgroup effects are statistically significant. Here, the plausibility of a true subgroup effect will be dependent on the direction of the subgroup effect as well as issues of previous specification and the theoretical plausibility of an effect;
When the subgroup effects (either significant or non-significant) are in different directions for different subgroups (qualitative interaction). For example, when the intervention is beneficial in men but harmful in women; or when the intervention is ineffective in poorer populations, but the intervention is beneficial for more well-off populations. Yusuf et al6 suggest that such qualitative interactions are uncommon and are to be viewed ‘with scepticism’. However, in the case of public health and health promotion interventions such interactions may be more common, as the pathways between intervention and outcome are often complex and interventions may affect different populations in different ways.
In meta-analyses of subgroup data Thompson and Higgins18 also note the importance of calculating interaction effects (eg, comparing men and women) separately within trials, to avoid confounding between trials. It is not known how common the failure to calculate interaction effects is, but it is an important area for further research because the reporting of differences between subgroups in the absence of formal tests of interaction is common in reviews using narrative methods. Sun et al19 have also similarly recently emphasised that the independence of the subgroup effect is an important aspect of its credibility, along with assessment of whether the direction of the effect was specified a priori; whether (in the case of trials) the subgroup analysis is based on post-randomisation characteristics, and is therefore much less reliable, and whether the interaction is consistent across closely related outcomes within the study. If it is, then the existence of a subgroup effect is more plausible.
Do subgroups really matter?
In some cases, it has been suggested we should be more interested in overall effects than in subgroup effects anyway, because the overall effect is a more reliable estimate. For example, in meta-analyses the results observed in subgroups may differ by chance from the overall effect, and the subgroup findings may not be confirmed by subsequent large trials. In this case, the best estimate of the outcome of the intervention in that subgroup will come paradoxically, by discounting the results from that subgroup, and using the results of the overall meta-analysis (sometimes known as Stein's paradox).20 Fletcher9 makes a similar point: that when interpreting the results of subgroup analyses, a good working assumption is that the main result probably applies to everyone, unless good evidence exists to the contrary.
This may not be a universal law, however, and exactly the opposite may apply to the effects of interventions on health equity (particularly when comparing the effects on poorer vs better-off subgroups). It is entirely plausible that interventions—for example, health promotion interventions—may have very different effects in one population group compared with another. For example, promoting healthy eating in schools works better for young women than young men,21 and school-based cognitive behavioural interventions may be less effective for young people from families with low SES.22
In the case of equity-focused reviews there are several other considerations. One is that the plausibility of the subgroup effect may depend on the extent to which it is consistent with existing theory about inequities—in particular existing knowledge about the causal pathways between the intervention and the outcome, and about the mechanisms by which inequities are created. This is consistent with Oxman and Guyatt's ‘safety rules’ (updated in 2010)19 for subgroup analyses, which among other things suggest that a subgroup difference is real if there is indirect evidence that supports the hypothesised difference.23 In the case of equity-focused reviews, prespecifying the subgroup analyses (consistent with the existing guidance referred to above) also requires outlining in advance the pathways between intervention and outcome (the logic model), and the likely mechanisms and the likely effects on different subgroups of interest. This may be based on evidence about processes and mechanisms. For example, in a systematic review of school feeding the mechanisms by which such a programmes may work included short-term hunger relief, reduced absenteeism and the knock-on effects of improved school diet on an improved home diet.24
A final point relates to the inferences that may be made from subgroup analyses. Guidelines on subgroup analyses tend to emphasise their exploratory and hypothesis-generating nature as opposed to their hypothesis-testing nature, even if they derive from RCT (eg, box 1). However, in many cases such analyses are the best evidence we have of the potential effects (and adverse effects) of interventions, and there is a strong case to be made for using existing data more efficiently by using subgroup analyses to make appropriately cautious inferences about the effects of interventions on health equity. As noted above, inferences can be strengthened by assessing whether they are specified a priori. In the case of equity they will be rendered more plausible if existing theories on the creation of health inequities can be used in advance to suggest the direction and nature of possible subgroup effects.
Guidance on interpreting subgroup analyses in systematic reviews
Subgroup analyses should as far as possible be restricted to those proposed before data collection, and any chosen after this should be clearly identified.
Trials should ideally be powered with subgroups in mind, although for modestly sized interactions this may not be feasible.
Subgroup-specific analyses are particularly unreliable and are affected by many factors. Subgroup analyses should always be based on formal tests of interaction, although even these should be interpreted with caution.
The results of subgroup analyses should not be over-interpreted. Unless there is strong supporting evidence, they are best viewed as a hypothesis-generating exercise. In particular, one should be wary of evidence suggesting that the intervention is effective in one subgroup only.
Any apparent lack of differential effect should be regarded with caution unless the study was specifically powered with interactions in mind.
Source: Brookes et al.17
We suggest that when interpreting subgroup analyses from an equity perspective the following five issues may be of particular relevance in addition to the general criteria relating to such analyses:
Analyses should be informed by theoretical considerations relating to the social determinants of health and health inequalities, and how these operate; while this is true of any review of an intervention, it is particularly important in the case of subgroup analysis, because linking the analyses to pre-existing theory provides some protection against atheoretical post-hoc data dredging;
Analyses should be prespecified based on hypotheses about the direction and nature of the effects, informed by a logic model;25
The PROGRESS-Plus framework may be useful in selecting relevant subgroups (ie, relevant to the review question) but it should not be used as a ‘shopping list’ to dredge for subgroup differences—a compelling pre-hoc argument is needed;
Formal tests of interaction should be used when possible when comparing subgroups; in the absence of this, such as in narrative reviews, there is a need for even more caution in the interpretation of observed differences between groups; and
Good practice in subgroup analysis would lead to more robust, policy-relevant evidence.19
Overall, assessing the effects of interventions on health equity requires an interaction between theory and methods. Further methodological work in this area is required and could, for example, follow-up the suggestion of Sun et al19 of developing a visual analogue scale to indicate the plausibility of subgroup effects. Such a scale could be developed and applied to equity effects uncovered in systematic reviews, and could increase the credibility and usability of such findings. More generally, the issue of subgroup analyses could be dealt with more often ‘at source’ by ensuring that new intervention studies and new systematic reviews are located within appropriate theoretical frameworks, which can then be used to specify the necessary analyses in advance.
What is already known on this subject
Policymakers and practitioners need evidence on the effects of interventions in subpopulations.
Much of this evidence will be based on subgroup analyses, which are often seen as misleading.
However, rejecting all such analyses may risk throwing the baby out with the bathwater; the application of best practice in subgroup analysis to questions about equity may help inform public health policy and practice.
What this study adds
Five points to consider in conducting subgroup analyses from an equity perspective are provided.
Thanks to Iain Chalmers for comments on an earlier version of this paper.
Funding PT is supported by a Canada Research Chair on Health Equity.
Competing interests None stated.
Provenance and peer review Not commissioned; externally peer reviewed.