Research paperThe 52 symptoms of major depression: Lack of content overlap among seven common depression scales
Graphical abstract
Introduction
“The appearance of yet another rating scale for measuring symptoms of mental disorder may seem unnecessary, since there are so many already in existence and many of them have been extensively used.” (Hamilton, 1960).
Major Depressive Disorder (MDD) is among the most common mental disorders (Kessler et al., 2003), and studied in various disciplines ranging from the social sciences to genetics. Depression severity is studied so pervasively – to enroll study participants or track treatment efficacy, as a dependent variable, predictor, covariate, or moderator – that 3 rating scales are among the 100 most cited papers in science (van Noorden et al., 2014): the Hamilton Rating Scale for Depression (HRSD; rank 51) (Hamilton, 1960), the Beck Depression Inventory (BDI; rank 53) (Beck et al., 1961), and the Center of Epidemiological Scales (CES-D; rank 54) (Radloff, 1977).
Interestingly, a great variety of rating scales are used to assess depression severity; Santor et al. (2006) identified 280 different instruments developed in the last century, of which many are still in use. The routine practice is to conduct research based on one particular scale that is chosen for variable reasons: the scale may be available as a tool in the library of the University, it may be the gold standard in the particular subfield of depression research (such as the HRSD for antidepressant trials), or it may be the local custom of the department or hospital. The rationale for using specific scales – say, the HRSD instead of the CES-D or BDI – is rarely provided in scientific publications, and conclusions are drawn about depression in general, not about depression measured by a particular scale.
The tacit – and untested – assumption underlying this practice is that various depression instruments can be used as interchangeable measurements of depression severity. If this assumption does not hold, results of depression studies may be idiosyncratic to the particular scale used, posing a major challenge to the replicability and generalizability of depression research (Santor et al., 2006, Snaith, 1993). For example, a large clinical trial may establish the efficacy of an antidepressant drug in a particular scale – which could have real implications for patients – although participants may show no clinical improvement on a range of other scales.
A number of reasons speak towards the possibility that rating scales are not interchangeable measures of depression severity. First, studies using multiple depression scales have identified differential scale performance. For instance, common instruments differ markedly in their classification of depressed patients into severity categories (Zimmerman et al., 2012). Second, psychometric analyses have documented that most scales are multidimensional, meaning they assess several constructs (Fried et al., 2016b); these factor structures, however, do not generalize across scales (Shafer, 2006, van Loo et al., 2012). Since scales measure different constructs, using different instruments may lead to different results; this is more likely to be problematic the more severe the heterogeneity of depression symptoms across different rating scales is. Finally, depression is a highly heterogeneous syndrome with many clinical presentations (e.g., Fried and Nesse, 2015a; Olbert et al., 2014) and numerous biological and neuroimaging correlates (e.g., Cassano and Fava, 2002), and individual depression symptoms such as sadness, insomnia, concentration problems or suicidal ideation differ in important properties such as biological markers, risk factors, and impact on impairment of functioning (for a review, see Fried and Nesse, 2015b). Symptoms also seem to respond differentially to antidepressant treatment (Hieronymus et al., 2016, Hieronymus et al., 2015). Overall, this implies that rating scales may only be interchangeable indicators of depression severity inasmuch as their item content overlaps.
If overlap of symptom content among scales is high, interchangeable use of depression instruments may not pose a severe challenge. If overlap is low, however, the routine practice of using one particular scale in depression research may lead to idiosyncratic results and threaten the validity of a very large and important field of research. Given the pronounced heterogeneity of the depressive syndrome that may well be reflected in clinical instruments, the concern that depression instruments vary widely in symptom content is not far-fetched.
The main goal of the present report is thus to quantify the overlap of items among widely used depression rating scales.
Section snippets
Depression rating scales
To estimate the extent to which common rating scales of depression differ in terms of item content, 7 common rating scales for depression were examined: the 21-item BDI-II (Beck et al., 1996; from here on referred to as BDI), the 17-item HRSD, the 20-item CES-D, the 30-item Inventory of Depressive Symptoms (IDS) (Rush et al., 1996), the 16-item Quick Inventory of Depressive Symptoms (QIDS) (Rush et al., 2003), the 10-item Montgomery-Åsberg Depression Rating Scale (MADRS) (Montgomery and Asberg,
Results
The content analysis of 125 items across 7 scales resulted in 52 disparate depression symptoms (Fig. 1).
Symptoms appear in a mean of 3 of the 7 rating scales (mode=1, median=2.5). Of the 52 symptoms, 21 (40%) appear only in one single instrument, whereas 6 (12%) feature across all instruments: sad mood, appetite decrease, fatigue, and the 3 insomnia items early, middle, and late insomnia (cannot fall asleep, wakes up during the night, wakes up in the very early morning). Of these, sad mood is
Discussion
The analyses identified a total of 52 specific disparate depression symptoms in 7 common depression scales. The overall overlap of item content among questionnaires was low: 40% of all symptoms appeared only in a single scale, only 12% across all instruments. These findings imply that the routine practice of using scales as interchangeable measurements of depression severity is problematic and may pose a major threat to the generalizability and replicability of depression research. Given the
Acknowledgements
I would like to extend my sincerest thanks to: Jana Jarecki, for help with Fig. 1; Sophie van der Sluis, for the calculation of sum-score correlations given scale length and inter-item correlation; and Don Robinaugh and Lauren Bylsma, for the very helpful comments on previous versions of this manuscript.
During the preparation of this manuscript, EIF was supported in part by the Research Foundation Flanders (G.0806.13), the Belgian Federal Science Policy within the framework of the
References (42)
- et al.
Depression and public health
J. Psychosom. Res.
(2002) - et al.
What are “good” depression symptoms? Comparing the centrality of DSM and non-DSM symptoms of depression in a network analysis
J. Affect Disord.
(2016) - et al.
Depression is not a consistent syndrome: an investigation of unique symptom patterns in the STAR*D study
J. Affect Disord.
(2015) - et al.
Toward a generalizable model of symptoms in major depressive disorder
Biol. Psychiatry
(1998) - et al.
A genome-wide association study of depressive symptoms
Biol. Psychiatry
(2013) - et al.
Genetic association study of individual symptoms in depression
Psychiatry Res
(2012) - et al.
The 16-Item quick inventory of depressive symptomatology (QIDS), clinician rating (QIDS-C), and self-report (QIDS-SR): a psychometric evaluation in patients with chronic major depression
Biol. Psychiatry
(2003) Diagnostic and Statistical Manual of Mental Disorders
(2013)- et al.
Reviews and overviews The Hamilton depression rating scale: has the gold standard become a lead weight?
Am. J. Psyc
(2004) - et al.
Cognitive Therapy of Depression
(1979)
Comparison of beck depression inventories -IA and -II in psychiatric outpatients
J. Pers. Assess.
An inventory for measuring depression
Arch. Gen. Psychiatry
Cross-trial prediction of treatment outcome in depression: a machine learning approach
Lancet Psychiatry
Straightforward Statistics for the Behavioral Sciences
The impact of individual depressive symptoms on impairment of psychosocial functioning
PLoS One
Depression sum-scores don’t add up: why analyzing specific Depression symptoms is essential
BMC Med.
A rating scale for depression
J. Neurol. Neurosurg. Psychiatry
Discovering endophenotypes for major depression
Neuropsychopharmacology
Consistent superiority of selective serotonin reuptake inhibitors over placebo in reducing depressed mood in patients with major depression
Mol. Psychiatry
Cited by (328)
Lipid parameters and depression in patients with chronic tinnitus: A cross-sectional observation
2024, Journal of Psychosomatic ResearchBurnout, resilience, and coping among esports players: A network analysis approach
2024, Computers in Human BehaviorPhysical activity and specific symptoms of depression: A pooled analysis of six cohort studies
2024, Journal of Affective Disorders