Table 1

Schemata for appraising quantitative evaluations of intervention effectiveness

Type of schema and evidence to be appraised	How the schema works and the criteria used
Critical appraisal checklists for quantitative studies of intervention effectiveness    Checklists derived from the evidence based medicine working group (EBMWG), which appraise articles about clinical therapy or prevention.^4,5,14	Guide appraisal of the validity and applicability of published evidence. Structures critical appraisal into 3 sections: Are the results valid? What were the results? Will the results help in caring for patients?
	The validity of evaluative research is judged on the level of evidence (study design and its potential for eliminating bias, e.g. systematic review of RCTs is the highest level of evidence); and the implementation of methods and analysis.
	Clinical importance and applicability of the findings are determined by the magnitude (with confidence intervals) of the estimate of effect and relevance of the outcomes measured.
Checklist from Oxford based Public Health Resource Unit for appraising articles about health interventions.¹⁵	Focus on the validity of research, which is assessed on similar criteria to above: study design; selection bias; confounding; blinding; data collection and classification of outcomes; follow up, withdrawal and drop out and analysis.
Critical appraisal checklist for evaluating research on public health interventions, from the Effective Public Health Practice Project, Ontario Ministry of Health.¹⁶	Ontario checklist has similar criteria as above, but also considers the integrity of intervention being evaluated.
Critical appraisal checklists within a guide for preparing systematic reviews.
Critical appraisal criteria in Cochrane Collaboration Handbook for Reviewers in the Cochrane Library. Schema is part of a guide for preparing and disseminating systematic reviews of RCTs on intervention effectiveness.¹⁷	The quality of an RCT is assessed on criteria as above: Assignment to treatment and control groups and blinding Degree of potential confounding The classification of outcomes and follow up Appropriate analysis, for example, analysis by ‘intention to treat’
The Campbell Collaboration was established for preparing systematic reviews on social and education interventions.¹⁸	Approaches to evaluating evidence are under consideration, discussion papers available on their website.
Guides for preparing or evaluating reviews and clinical guidelines      A guide for evaluating reviews, RCTs and non-randomised observation studies: Method for Evaluating Research and Guideline Evidence (MERGE).¹⁹	Critical appraisal checklists for reviews, RCT and observation studies (non-randomised controlled trial, cohort, case-control, before and after and interrupted time series). Evaluation criteria are grouped according to: Descriptive information about review or study (for example, type of intervention, implementation, outcomes considered, potential confounders and characteristics of population and setting) Study design, implementation and analysis Overall assessment of credibility of findings
A handbook from the Australian NH and MRC on how to assess and apply research evidence. Part of a series on preparing practice guidelines.²⁰	Structures critical appraisal into three questions regarding the evidence: Is there a real effect? (Strength of evidence: level, quality and statistical precision) Is the size of the effect clinically important? (Size of effect) Is the evidence relevant to practice? (Relevance of evidence)
Rules of evidence used to formulate graded recommendations for action    Rules of evidence from the Canadian Task Force on the Periodic Health Examination, which made recommendations on preventive interventions in primary settings.^2,¹¹    Criteria from the US Preventive Services Task Force, which also formed recommendations on clinical preventive interventions.³	Recommendations for action are determined by a systematic review of studies of effectiveness, which include consideration of: Level of evidence Quality of study methods Number of studies Magnitude of effect Consistency of findings Generalisability of findings to primary care setting
Rules of evidence from the Task Force on Community Preventive Services, which is forming recommendations about public health interventions.²¹	Strength, Class or Grade of Recommendations primarily based on the level of evidence (study design): Level I evidence is a systematic review of RCTs. Intervention cost and burden of disease are included if evidence on effectiveness is uncertain. Cost effectiveness to be potentially included future formulations of recommendations.
The Oxford Centre for EBM have used Levels of Evidence to grade recommendations on therapy, prevention, aetiology and harm.²²	The Grade of recommendations made is linked to level of evidence, which determined by the study design. Levels of evidence in descending order are: Systematic review of RCTs with homogeneity (Level 1) Individual RCT with narrow confidence interval Systematic review of cohort studies, single cohort or RCT with <80% follow up Systematic review of case-controls, individual case-control Case series or poor quality cohort or case-control Expert opinion without explicit critical appraisal or based on physiology, bench research or ‘first principles’.
Approaches to assessing causal associations and causal inference      The schemata guide the appraisal of epidemiological evidence on causality (causal relations between two variables).^23,34 The criteria can be applied to appraise evaluation research (evidence on the causal link between an intervention and its effects).	The likelihood of a causal relationship is determined by: Strength: magnitude of measured association Consistency: repeated in multiple observations Temporality: cause precedes effect Biological gradient: a dose response relation Coherence: no conflict with current knowledge Plausibility: biological or theoretical Experimental evidence: association examined using manipulation and controls (now considered a ‘gold standard’ demonstration of causality)
International Agency for Research on Cancer, (WHO) uses standards of evidence to appraise human and animal studies of carcinogenicity.²⁵	Criteria for causation are used to appraise evidence when conducting an assessment of cancer risk. Weight of supporting evidence (causality) determined by: Study designs (epidemiological studies) Quality of studies Other studies (for example, animal studies) Plausible inferences about mechanism of action Other causal criteria (strong association, replication in multiple studies and consistency of findings)
Quality assessment criteria for programme evaluations      Guides the appraisal of health promotion programme evaluations.²⁶	Examines the limitation of the intervention and the overall quality of the evaluation conducted. Considers: Stage of intervention Nature of objectives Target group specification Variables measures Instruments used to assess outcomes Evaluation study design