Research reportA 4-item measure of depression and anxiety: Validation and standardization of the Patient Health Questionnaire-4 (PHQ-4) in the general population
Introduction
Depression and anxiety are among the most prevalent and disabling conditions in Western societies, and their burden on the individual and society is tremendous (Demyttenaere et al., 2004, Kessler et al., 1994, Leon et al., 1995). With the aim to improve the average physicians' detection rates, which currently range below 50% (Ansseau et al., 2004, Löwe et al., 2003, Löwe et al., 2004a), with only minimal additional burden, ultra-brief self-report screening instruments for depression and anxiety have been developed and validated. Several treatment guidelines now provide evidence-based recommendations regarding screening adults for depression in clinical practices that have systems in place to assure accurate diagnosis, effective treatment, and follow-up (National Institute for Health and Clinical Excellence, 2004, U.S. Preventive Services Task Force, 2002). In contrast to the availability of ultra-short depression screeners (Mitchell and Coyne, 2007), to our knowledge, only one ultra-brief screening scale for anxiety has been published (Kroenke et al., 2007). Although not yet included in treatment guidelines, screening for anxiety was recently suggested as a necessary first step in improving outcomes in patients with anxiety disorders (Katon and Roy-Byrne, 2007).
Ultra-short screening tools are typically defined as measures with 1–4 items, requiring less than 4 min to complete (Mitchell and Coyne, 2007). Results from two recent meta-analyses and a comparative study suggest that ultra-short two- or three-question tests perform better than single item screeners in depression screening, identifying approximately 80% of the cases (Corson et al., 2004, Gilbody et al., 2007, Mitchell and Coyne, 2007). The Patient Health Questionnaire-2 (PHQ-2) (Kroenke et al., 2003, Löwe et al., 2005) is the most validated 2-item screener for depression. It is the short version of the 9-item Patient Health Questionnaire (PHQ-9) (Gräfe et al., 2004, Kroenke et al., 2001). The new diagnostic principle of the PHQ-9 was that each of the nine items evaluates the presence of one of the DSM-IV diagnostic criteria of major depressive disorder (Löwe et al., 2004a, Spitzer et al., 1999). The PHQ-2 focuses solely on depressed mood and loss of interest, thereby representing the DSM-IV diagnostic core criteria. Results from a prospective criterion standard study in a sample of 520 medical outpatients suggest that the PHQ-2 has good criterion and convergent validity and is sensitive to change (Löwe et al., 2005). Other studies indicate good criterion validity of the PHQ-2 as a screening tool for major depression in older adults (Li et al., 2007), pregnant and postpartum women (Bennett et al., 2008), patients with coronary artery disease (Thombs et al., 2008), and patients with HIV / AIDS (Monahan et al., 2009). However, while one of the above-mentioned meta-analyses evaluated the PHQ-9 to be equally effective as longer clinician-administered instruments, more research was requested to validate the PHQ-2 and to compare its diagnostic abilities to those of the PHQ-9 (Gilbody et al., 2007).
For anxiety, the 2-item Generalized Anxiety Disorder Scale (GAD-2) (Kroenke et al., 2007) was recently published as the short version of the 7-item Generalized Anxiety Disorder Scale (GAD-7) (Löwe et al., 2008a, Spitzer et al., 2006). With areas under the curve of 0.80 to 0.91 for the four most common anxiety disorders diagnosed with a criterion standard interview, a recent validation study of 965 primary care patients indicated good criterion validity of the GAD-2.
Despite the promising operating characteristics of the PHQ-2 and the GAD-2, as well as their potential usefulness for medical care and research, neither of the ultra-brief scales has been validated in the general population. Normative data from the general population, which would allow the interpretation of individual PHQ-2 and GAD-2 scores, is also not available.
Our study aims to establish reliability, validity, as well as normative data for the PHQ-2, the GAD-2, and their composite measure, the 4-item Patient Health Questionnaire-4 (PHQ-4) (Kroenke et al., 2009), in a large and representative sample from the general population. Specifically, we investigated the item characteristics, reliability, and factorial structure, including factorial invariance for different age and gender groups. Second, construct validity of the PHQ-2, GAD-2, and PHQ-4 was assessed in the general population by investigating associations between scale scores, other self-report measures, and well-known demographic risk factors for depression and anxiety. Finally, in order to provide comparative data for the application of these three measures, we generated age- and gender-specific normative data for the PHQ-2, GAD-2, and PHQ-4.
Section snippets
Study design and participants
The validation and standardization of the PHQ-4 in the general population was part of a nationally representative face-to-face household survey conducted in Germany. This survey was also used to provide normative data for the 7-item Generalized Anxiety Disorder Scale (GAD-7) (Löwe et al., 2008a). Within this survey, the study participants were interviewed using a structured self-report questionnaire. The survey was carried out in two waves between May 5 and June 8, 2006 by a total of 231 (first
Sample characteristics
From 8106 valid addresses, 1199 persons (14.8%) were not at home at the time of the three visits of the interviewers, 1806 persons refused to participate (22.3%), and 65 persons (0.8%) were not able to complete the study questionnaire due to severe illness. A total of 5036 persons agreed to participate, provided verbal informed consent, and completed the study questionnaire. Response rate among all subjects met by the interviewers was 72.9% (5036/6907) while participation rate among all
Discussion
The findings from this study, which included more than 5000 subjects, suggest that an ultra-brief 4-item measure can reliably and validly measure depression and anxiety in the general population. While preliminary data on the validity of the PHQ-4 and its two subscales (the PHQ-2 for depression and the GAD-2 for anxiety) in clinical samples were previously available (Kroenke et al., 2009, Löwe et al., 2005), this is the first study to provide evidence for the reliability and validity of the
Role of funding source
The study was funded by the Friedrich-Ebert-Stiftung, Germany. The funding source had no role in designing the study, in the collection, analysis, and interpretation of data, in the writing of the report, or in the decision to submit the paper for publication.
Conflict of interest
The authors have no conflicts of interest in connection with this paper.
Acknowledgements
We thank Stefanie Müller, MA, who assisted with data analyses, and we thank all subjects for participating in our study.
References (58)
- et al.
High prevalence of mental disorders in primary care
J. Affect. Disord.
(2004) - et al.
The validity of the Hospital Anxiety and Depression Scale. An updated literature review
J. Psychosom. Res.
(2002) - et al.
Depression profile in patients with and without chronic heart failure
J. Affect. Disord.
(2008) - et al.
The epidemiology of generalized anxiety disorder
Psychiatr. Clin. North Am.
(2001) - et al.
An Ultra-Brief Screening Scale for Anxiety and Depression: the PHQ-4
Psychosomatics
(2009) - et al.
Detecting panic disorder in medical and psychosomatic outpatients: comparative validation of the Hospital Anxiety and Depression Scale, the Patient Health Questionnaire, a screening question, and physicians' diagnosis
J. Psychosom. Res.
(2003) - et al.
Comparative validity of three screening questionnaires for DSM-IV depressive disorders and physicians' diagnoses
J. Affect. Disord.
(2004) - et al.
Detecting and monitoring depression with a two-item questionnaire (PHQ-2)
J. Psychosom. Res.
(2005) - et al.
Depression, anxiety and somatization in primary care: syndrome overlap and functional impairment
Gen. Hosp. Psych.
(2008) - et al.
Abnormalities in weight status, eating attitudes, and eating behaviors among urban high school students: correlations with self-esteem and anxiety
J Adolesc. Health
(1996)
Base rates for panic and depression according to the Brief Patient Health Questionnaire: a population-based study
J. Affect. Disord.
Diagnostic and Statistical Manual of Mental Disorders
Amos 16.0 User's Guide
Efficiency of a two-item pre-screen to reduce the burden of depression screening in pregnancy and postpartum: an IMPLICIT network study
J. Am. Board Fam. Med.
Significance tests and goodness of fit in the analysis of covariance structures
Psychol. Bull.
Cronbach's alpha
BMJ
State of the art procedures for translating, validating and using psychoeducational tests in cross-cultural assessment
Sch. Psychol. Int.
One-year prevalence of subthreshold and threshold DSM-IV generalized anxiety disorder in a nationally representative sample
Depress. Anxiety
Common and specific dimensions of self-reported anxiety and depression: implications for the cognitive and tripartite models
J. Abnorm. Psychology
Statistical power analysis for the behavioral sciences
Screening for depression and suicidality in a VA primary care setting: 2 items are better than 1 item
Am. J. Manag. Care
Prevalence, severity, and unmet need for treatment of mental disorders in the World Health Organization World Mental Health Surveys
JAMA
Measurement of self-esteem: findings regarding reliability, validity, and stability of the Rosenberg Scale
Diagnostica
Screening for depression in medical settings with the Patient Health Questionnaire (PHQ): a diagnostic meta-analysis
J. Gen. Intern. Med.
Screening for psychiatric disorders with the Patient Health Questionnaire (PHQ). Results from the German validation study
Diagnostica
Questions on life satisfaction — a short measure for assessing quality of life
Eur. J. Psychol. Assess.
Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives
Struct. Equ. Modeling
Psychological testing
Anxiety disorders: efficient screening is the first step in improving outcomes
Ann. Intern. Med.
Cited by (1524)
Health-related quality of life in the year following road trauma: Longitudinal analysis using piecewise latent curve modeling
2024, Journal of Affective DisordersQuality of life following liposuction for lipoedema: a prospective outcome study
2024, Journal of Plastic, Reconstructive and Aesthetic SurgeryHealth and psychiatric impairment associated with moral injury, military sexual trauma, and their co-occurrence in U.S. combat veterans
2024, Journal of Psychosomatic ResearchSocial stress in an interaction with artificial agents in virtual reality: Effects of ostracism and underlying psychopathology
2024, Computers in Human BehaviorSARS-CoV-2 infection is associated with physical but not mental fatigue – Findings from a longitudinal controlled population-based study
2024, Journal of Psychosomatic ResearchQuestionnaires About the End of Life for Cancer Patients – Is the Response Burden Acceptable?
2024, Journal of Pain and Symptom Management