Table 1

Characteristics of QATs and key study quality domains addressed

QAT domainEPHPPTCASPNOSLiverpoolGATEROB
ApplicabilityRCT, non-randomised trial, cohort, case-control, cross-sectionalRCT, cohort, case-control, diagnostic tests, economic evaluations, qualitative research, systematic reviewsCohort, case-controlRCT, non-randomised trial, cohort, case-control, cross-sectionalRCT, non-randomised trial, cohort, case-control, cross-sectionalRCT, non-randomised trial, cohort, case-control, cross-sectional
ClassificationChecklistChecklistScaleScaleChecklistChecklist
Summary scoreQualitativeNoQuantitativeQuantitativeQualitativeNo
Number of components (questions)8 (22) (only six components included in summary score)3 (10–12 depending on study design)9 (9)8–9 (8–9 depending on study design)5 (25)9 (9)
Methods for selecting study populationYesYesYesYesYesPartial (only for RCTs)
Methods for measuring exposure and outcome variablesYes
  • Partial

  • RCT: outcome only cohort: both case-control: exposure only

YesYesPartial (outcome only)Partial (outcome only)
Design-specific sources of bias (excluding confounding)Partial (only for RCTs, non-randomised trials)YesYesYesYes (only for RCTs, non-randomised trials)Partial (only for RCTs, non-randomised trials)
Methods to control confoundingYesYesYesYesYesYes
Statistical methods (excluding control of confounding)Partial (not included in summary score)Partial (no decision made about quality)NoNoYesNo
Conflict of interestNoNoNoNoNoNo
Major strengths and weaknesses (in addition to features above)
  • Use is possible without advanced epidemiological training

  • ‘One size fits all’ tool does not do justice to strengths and weaknesses of different study designs

  • Use is possible without advanced epidemiological training

  • Low inter-rater reliability* due to combination of main questions and subquestions

  • Too few answer categories for several questions

  • High inter-rater reliability* due to very specific answer categories

  • Too few answer categories for several questions

  • Broad applicability of four companion tools, each geared towards specific study design features

  • Adaptation of considerations on exposure and outcome measurement to systematic review question

  • Broad applicability of two companion tools, each geared towards specific study design features

  • High inter-rater reliability* due to very specific questions

  • Combination of indepth assessment of specific limitations with a two-component summary assessment

  • Use requires substantial time investment

  • Compatibility with the most-widely used tool for systematic reviews of RCTs

  • ‘One size fits all’ tool does not do justice to strengths and weaknesses of different study designs

  • Use requires advanced epidemiological training

  • * Inter-rater reliability was not formally assessed and this statement is based on our subjective experience across the six tools.

  • CASP, Critical Appraisal Skills Programme; EPHPPT, Effective Public Health Practice Project tool; GATE, Graphical Appraisal Tool for Epidemiological Studies; NOS, Newcastle–Ottawa Scale; QAT, quality appraisal tool; RCT, randomised controlled trial; ROB, Cochrane Collaboration Risk of Bias tool.