Article Text

Download PDFPDF

Reviewing evidence on complex social interventions: appraising implementation in systematic reviews of the health effects of organisational-level workplace interventions
  1. M Egan1,
  2. C Bambra2,
  3. M Petticrew3,
  4. M Whitehead4
  1. 1
    Medical Research Council Social and Public Health Sciences Unit, University of Glasgow, UK
  2. 2
    Department of Geography, Wolfson Research Institute, Durham University, UK
  3. 3
    Public and Environmental Health Research Unit, London School of Hygiene and Tropical Medicine, UK
  4. 4
    Division of Public Health, University of Liverpool, UK
  1. Dr M Egan, Medical Research Council Social and Public Health Sciences Unit, University of Glasgow, 4 Lilybank Gardens, Glasgow G12 8RZ, UK; M.Egan{at} and Matt.Egan{at}


Background: The reporting of intervention implementation in studies included in systematic reviews of organisational-level workplace interventions was appraised. Implementation is taken to include such factors as intervention setting, resources, planning, collaborations, delivery and macro-level socioeconomic contexts. Understanding how implementation affects intervention outcomes may help prevent erroneous conclusions and misleading assumptions about generalisability, but implementation must be adequately reported if it is to be taken into account.

Methods: Data on implementation were obtained from four systematic reviews of complex interventions in workplace settings. Implementation was appraised using a specially developed checklist and by means of an unstructured reading of the text.

Results: 103 studies were identified and appraised, evaluating four types of organisational-level workplace intervention (employee participation, changing job tasks, shift changes and compressed working weeks). Many studies referred to implementation, but reporting was generally poor and anecdotal in form. This poor quality of reporting did not vary greatly by type or date of publication. A minority of studies described how implementation may have influenced outcomes. These descriptions were more usefully explored through an unstructured reading of the text, rather than by means of the checklist.

Conclusions: Evaluations of complex interventions should include more detailed reporting of implementation and consider how to measure quality of implementation. The checklist helped us explore the poor reporting of implementation in a more systematic fashion. In terms of interpreting study findings and their transferability, however, the more qualitative appraisals appeared to offer greater potential for exploring how implementation may influence the findings of specific evaluations. Implementation appraisal techniques for systematic reviews of complex interventions require further development and testing.

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

The case has been made for providing policy-makers with synthesised, detailed and robust accounts of the implementation of effective interventions in order to make better progress in tackling population morbidities and inequalities.1 Advocates of a staged approach to the development and evaluation of complex interventions have also stressed the importance of accurately defining interventions and promoting effective implementation.2 Implementation refers to the design and delivery of interventions.36 The way an intervention is implemented may influence its outcomes, and evaluations that do not take this into account risk (for example) misinterpreting negative outcomes that result from poor implementation as evidence that interventions are inherently ineffective.7 8 We developed a tool to appraise the quality of reporting of implementation and applied this tool to four systematic reviews of complex intervention evaluations affecting the workplace.

Implementation and complex interventions

Researchers and policy-makers have called for evidence from systematic reviews of social interventions affecting so-called “upstream” health determinants such as employment, housing, transport, etc.9 10 Such interventions are often complex and difficult to evaluate.11 12 They may involve multiple, context-specific interventions and an unstandardised approach to implementation.13

In our experience of conducting systematic reviews of “upstream” interventions, it is often difficult from the reporting of a complex intervention evaluation to determine: (1) what exactly the intervention entailed; (2) whether the intervention was implemented fully or adhered to good practice guidelines; and (3) whether there were confounding factors in the wider social context that would affect the outcome of the intervention.1421 This contrasts with reports of less complex interventions in which (1) the intervention is clear (eg a specific drug); (2) intervention delivery was prescribed through a detailed protocol; and (3) at least some attempt was made from the planning stage onwards to identify and reduce bias associated with key confounders.

Implementation appraisal

Implementation appraisal is not a new concern.2229 Some systematic reviews have considered whether interventions were delivered as prescribed by the study protocol (“treatment integrity” or “programme adherence”).30 However, appraisal tools used by systematic reviewers usually focus on the methodological characteristics of primary studies rather than implementation issues.30 31 Such tools often take the form of checklists, although the practice of using checklist scores to appraise studies is problematic, leading some to advocate alternative approaches.31 32

Systematic reviews that attempt to rigorously appraise the implementation of complex interventions are the exception rather than the rule. A recent review of community-based injury prevention initiatives, which included appraisals of evidence on implementation, found that reporting of implementation was poor.33

We developed and incorporated an appraisal checklist into four systematic reviews of organisational-level workplace interventions, along with a less structured exploration of textual accounts of implementation in the included studies.1720 The checklist covered reporting of intervention design (including whether or not interventions were specifically designed to affect employee health), target population, delivery, psychosocial factors and the characteristics of population subgroups differentially affected by the interventions. Our primary aim was to appraise the reporting of implementation in primary studies; our study also considered whether or not there was evidence to suggest that higher standards of reporting were an indication of greater methodological rigour.34


The four reviews that incorporated our appraisal tool synthesised evidence on the health effects of (1) workplace interventions to increase employee control and participation in decision-making;17 (2) changes to team structures and work allocation affecting employees’ day-to-day tasks;18 (3) the health effects of instigating compressed working weeks;19 and (4) shift work interventions.20 Table 1 summarises the intervention types in these reviews. Their methods and outcomes have been described elsewhere.1720

Table 1 Details of the interventions included in the four systematic reviews

Our original checklist contained 28 criteria. These criteria were adapted from a number of sources, particularly Rychetnik et al, whose work had prompted our initial interest in implementation.3 4 27 3538 Two reviewers (ME and CB) piloted this checklist independently using 12 studies (taken from the participation and task restructuring reviews). On comparing their pilot appraisals, the reviewers agreed that the checklist had been difficult to interpret and apply consistently, and they criticised both its content and its face validity. The reviewers ascribed these problems to the checklist being unclear (often because criteria had been adapted from other contexts). The pilot checklist also coped poorly with ambiguities in reports of implementation (often, the answers to specific checklist criteria were implied rather than explicitly stated in the brief reports of implementation we identified—and it was often difficult to agree on the point at which reviewers could distinguish mere implication from reported fact). We decided that it would be preferable to work with a smaller number of broader criteria, and hence we shortened the checklist.

The final checklist included 10 criteria (response: yes/no). Studies were categorised by an implementation appraisal score (out of 10—one point for the presence of each criterion), distinguishing the “lowest”, “intermediate” and “higher” scoring studies. The checklist is presented in table 2.

Table 2 Thematic checklist for the appraisal of the reporting, planning and implementation of workplace interventions

Two reviewers (ME and CB, or CB and MP) independently applied the checklist to all the studies included in the four systematic reviews. Differences were resolved through consultation. We then used cross-tabulations to explore relationships between quality appraisal scores from our checklist and data on evaluation study designs, and with psychosocial and health outcomes (previous studies have suggested that more rigorous evaluations may be less likely to report positive outcomes).34 We also explored whether reporting of implementation differed by date of publication (ie whether or not reporting has improved in recent years) and type of publication (ie whether reporting is better or worse in peer-reviewed journals compared with other forms of publication).

Reported text that described implementation processes were also extracted from each study by one reviewer and checked by another to aid a less structured analysis of reporting of implementation for each review. We considered relevant data first on a case-by-case basis and explored the interactions between reported planning and implementation characteristics, contexts and outcomes. We discussed patterns and idiosyncrasies across different studies and synthesised key findings using a narrative approach. From this less structured process, we gained some insights into how a minority of authors explained outcomes in terms of implementation characteristics.


Implementation appraisals were conducted on a total of 103 studies (references can be obtained from the original reviews).1720 Twenty-one studies were identified in the task restructuring review, 18 studies in the employee participation review, 40 studies in the compressed working week review and 26 studies in the shift work review.1720 Two studies appeared in two reviews. In table 3, the numerical implementation scores are summarised for all studies and, in table 4, examples of summaries of implementation appraisals are presented for the higher scoring studies from each of the four reviews.

Table 3 Numerical summary of the results of the implementation appraisal checklist
Table 4 Examples of implementation appraisal summaries (higher scoring studies only)

Summary of implementation appraisals

Most studies achieved low scores (see table 3). The median score was 3 out of 10 (range = 0 to 7; lower and upper quartiles  = 1 and 4). This varied slightly between reviews (from 2 to 4). The median score was 3 for studies published between 1996 and 2000 and 2 for studies published between 2001 and 2006, and between 1991 and 1995 and before 1991.

As few studies achieved a high implementation score, we have categorised the studies as follows: 14 “higher” scoring studies (scoring ⩾5 in our implementation appraisal), 38 “intermediate” scoring studies (scoring 3 or 4 in our appraisal) and 51 “lowest” scoring studies (scoring <3).

The most commonly reported implementation themes were “motivation for intervention” (table 2, criteria 1—appearing in 76% of included studies) and employee support of the intervention (criteria 8—appearing in 54% of the studies). All the other themes were reported in less than a third of the total studies. Criteria 10 (differential effects/population characteristics) was only reported in 8% of the studies, while no study described resourcing, costs or cost–benefits of interventions (criteria 9).

Type of publication

Forty-nine included studies were published in peer review health journals, 41 in other peer review journals (mainly social science, occupational and managerial studies journals) and 13 in edited books or theses. Twelve per cent of articles from health journals received higher implementation scores compared with 15% of studies from both other journals and books or theses. Forty-seven per cent of articles from health journals received lower implementation scores compared with 51% of studies from other journals and 54% from books or theses.

Implementation and study design

Implementation appraisal scores were not useful predictors of robust study designs. We identified 32 prospective cohort studies with appropriate controls and have classed these as the most robust study designs: 36% of the studies with “higher” implementation scores were “most robust” compared with 45% of studies with intermediate scores and 20% of studies with low scores.

Implementation and health effects

All 103 studies included in the reviews evaluated at least one health outcome.1720 We have categorised the studies as follows: (1) those that reported at least one positive health outcome and no negative outcomes (n = 47); (2) those that reported at least one negative health outcome and no positive outcomes (n = 14); and (3) those that report conflicting health outcomes (positive and negative) or reported little/no change in all the health outcomes measured (n = 42).

We found no conclusive evidence that better reporting of implementation might be associated with positive health outcomes. There was a similar range of implementation scores for both the 47 studies with positive outcomes (47% scored <3, 40% scored 3 or 4, and 13% scored ⩾5) and the 42 studies with conflicting/little change in outcomes (45% scored <3, 40% scored 3 or 4, and 14% scored ⩾5). Fourteen studies reported negative outcomes, of which 84% scored <3, 15% scored 3 or 4, and none scored ⩾5 on the implementation checklists.

Unstructured appraisals of implementation

We extracted textual data on implementation from all the included studies for less structured, more qualitative appraisals. However, we focus on the 14 studies with negative health outcomes.

Implementation reporting tended to be brief and anecdotal. It was often unclear how authors had obtained their information about implementation and whether they had taken steps to avoid bias or error. These (important) objections aside, our more qualitative approach to implementation appraisal did appear to uncover potential explanations for how the implementation characteristics of some studies may have contributed to negative outcomes.

In the participation review, we found that the only two studies with negative health outcomes evaluated participatory interventions that had been implemented in workplaces undergoing organisational downsizing.17 We found that in the “task variety” review, negative health outcomes were more likely to result from interventions that were motivated for business reasons (managerial efficiency, productivity, cost, etc) rather than by employee health concerns.18 However, the studies identified for the compressed working week and the shift work reviews provide evidence of positive, negative or “little change” outcomes resulting from interventions regardless of whether they were motivated by business concerns, health concerns or pressure from employees.19 20


Promoting effective implementation is regarded as a key stage in the design and evaluation of complex interventions, and syntheses of evidence from such evaluations should incorporate data on implementation.1 2 We incorporated implementation data into four systematic reviews of workplace interventions, using both a specially developed checklist for measuring reporting of intervention design and implementation and a more qualitative approach to assessing such reports. We found that reporting of implementation was generally poor. Our experience led us to reflect upon whether a checklist is the best tool for appraising implementation, particularly as our qualitative approach was easier to conduct and, we conclude, more useful than the checklist-based approach.

Quality of reporting

In most cases, authors of included studies presented brief and anecdotal reports of implementation. We identified few descriptions of how authors obtained information about implementation, whether any prior code of good practice existed against which the quality of implementation could be measured, and whether any attempts were made to prevent biased reporting of implementation. Roen and colleagues recently published details of their attempts to appraise the implementation of injury prevention interventions, which identified similarly poor standards of reporting.33 However, they found that studies with methodologically stronger designs tended to provide poorer descriptions of implementation. We found no clear evidence of this relationship in our reviews.

Our checklist-based appraisals did find that most included studies provided some information about what motivated the implementers to deliver the intervention, and whether employees supported them. However, data on cost-effectiveness and differential effects on population subgroups were rarely reported, despite the widely stated view that research to inform public health policy and practice should provide evidence on these issues.11 12 We also found that reporting of implementation varied little by year or type of publication.

We also took a less structured (and less score-focused) approach to identifying reported data on implementation appraisal. This did identify some potential explanations for how implementation may have affected psychosocial and health outcomes, eg organisational downsizing, lack of management support and the aim of increasing individual productivity without regard to employee well-being were all offered as explanations for negative results. These issues were usually described anecdotally within the studies, yet they often provided the most plausible explanations for negative outcomes available to reviewers.

We note that other systematic reviewers have employed more qualitative approaches to implementation appraisal.29 Our own experience now leads us to advocate variations on this approach, perhaps as an adjunct to the use of implementation checklists.


More methodological work is required to develop our approach (and alternative approaches)33 to implementation appraisal: in particular to test inter-rater reliability and validity (the lack of such tests is a limitation to this study). We would focus our efforts on developing and testing qualitative implementation appraisal methods as we believe these may potentially provide greater insights than a checklist-based approach.

We do not rule out the possibility that a systematic review checklist could be developed to assist with implementation appraisals but, in our experience, this approach was problematic. The checklist we developed assessed reporting of implementation—this is not the same as appraising the quality of implementation, but good reporting is one prerequisite for such an appraisal.1 Our checklist therefore helped to demonstrate the urgent need for improved reporting, but did not help us to understand how implementation affected outcomes.

It should also be remembered that this paper only examines reviews of employment interventions. The generalisability of these findings depends on the degree to which employment researchers tend to report implementation differently from or similarly to researchers working in other fields.

We also note that our implementation checklist analysis explored psychosocial and health outcomes. While it is legitimate for public health researchers to be particularly interested in outcomes relevant to their field, we recognise that complex interventions such as those included in our reviews often have other outcomes (eg financial, managerial) of equal or greater importance to the implementers than health outcomes.

Quality of implementation

As stated above, our checklist was not designed to appraise quality of implementation. Even if we could have directly appraised the quality of implementation, the checklist scores would still have been problematic. Summary appraisal scores reveal the number and variety, but not the importance, of reported implementation characteristics. It may only take one flaw in the implementation to cause an intervention to fail, so a high intervention score is no guarantee of effectiveness.32

Furthermore, the development of a detailed checklist for measuring quality of implementation requires an a priori knowledge of the criteria that will distinguish well-implemented interventions from poorly implemented interventions.1 This may be feasible in some areas of research, when there is a strong consensus regarding standards of best practice, but that consensus does not always exist for every type of intervention. We attempted to develop such a list but quickly realised that the included interventions were too varied and there was often no clear way of prescribing in detail what constituted good or bad practice. We suspect that this uncertainty over best practice may increase with the complexity of an intervention, particularly if the intervention is flexible in design and context specific.

For example, what resources are sufficient to adequately resource an intervention? Is collaboration with employees always desirable, or can interventions achieve similar or better results if they are imposed by managers taking a “strong leader” approach? It may be desirable for people managing implementation processes to have appropriate experience, but “appropriate” needs to be defined: must managers have prior experience of delivering specific interventions, or is their general role in management to be regarded as appropriate enough?

The answers to all these questions seem to us to depend on the intervention and specific circumstances.

What is already known on this subject

  • Systematic reviews have been advocated as a means of identifying and appraising evidence on the health effects of complex social interventions.

  • Implementation should be an important feature of these types of systematic review.

  • However, to date, reviewers have often placed more emphasis on appraising the methodological characteristics of evaluations rather than the intervention itself and how it is implemented.

  • Implementation appraisal tools have therefore remained relatively underdeveloped in the systematic review literature, especially as regards more complex social interventions.

What this study adds

  • Aside from highlighting the lack of reporting of implementation issues in primary studies, this study also reveals which aspects of implementation were most commonly reported.

  • We do not recommend our checklist as a means of appraising how implementation influences the outcomes of interventions. Implementation appraisal may be best achieved through less structured and more qualitative approaches.

  • Future evaluations of implementation need to incorporate more contextual, qualitative information.

Policy implications

Primary studies and systematic reviews that are intended to influence policy and practice risk making erroneous recommendations if the quality of intervention implementation is not more robustly appraised.


Guidance on improving the reporting of implementation has been published elsewhere along with the recommendation that “adding simple criteria to reporting standards will significantly improve the quality and usefulness of published evidence and increase its impact on public health program planning”.1 Such guidance may need to be adapted to suit specific interventions, and our own checklist includes criteria that may be useful when reporting workplace interventions. However, we would advise caution against assuming that appraising the implementation of complex interventions is a simple matter. The appraisal tool we developed—like other appraisal tools—could only assess how well the implementation process was reported, rather than the quality of the process itself and, in most cases, reporting was poor. We also lacked detailed criteria on what constitute well-implemented workplace interventions that could safeguard or improve employee health.

Nonetheless, information on implementation and context is crucial for a nuanced assessment of the impact of complex interventions. Improvements in the reporting and appraisal of such information are overdue.


This work was partly funded by ESRC grant no. H141251011 (as part of the ESRC Centre for Evidence-based Public Health Policy) and was part of the Department of Health Policy Research Programme’s Public Health Research Consortium. ( Mark Petticrew was funded by the Chief Scientist Office of the Scottish Executive Department of Health while most of this work was carried out. The views expressed are those of the authors not the funders.

ME planned the study, collected and analysed data, is lead author and guarantor. CB, MP, MW and HT assisted in all aspects of the study including writing up.



  • Funding: Department of Health Policy Research Programme (Public Health Research Consortium), Economic and Social Research Council and the Chief Scientist Office of the Scottish Executive Health Department.

  • Competing interests: None.

  • This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.