Elsevier

Journal of Affective Disorders

Volume 208, 15 January 2017, Pages 191-197
Journal of Affective Disorders

Research paper
The 52 symptoms of major depression: Lack of content overlap among seven common depression scales

https://doi.org/10.1016/j.jad.2016.10.019Get rights and content

Highlights

  • This study aimed to determine symptom overlap of 7 common depression scales.

  • The scales differ considerably in content and encompass 52 depression symptoms.

  • The results stress the heterogeneity of the depressive syndrome.

  • Research results obtained via one scale may not replicate in other scales.

Abstract

Background

Depression severity is assessed in numerous research disciplines, ranging from the social sciences to genetics, and used as a dependent variable, predictor, covariate, or to enroll participants. The routine practice is to assess depression severity with one particular depression scale, and draw conclusions about depression in general, relying on the assumption that scales are interchangeable measures of depression. The present paper investigates to which degree 7 common depression scales differ in their item content and generalizability.

Methods

A content analysis is carried out to determine symptom overlap among the 7 scales via the Jaccard index (0=no overlap, 1=full overlap). Per scale, rates of idiosyncratic symptoms, and rates of specific vs. compound symptoms, are computed.

Results

The 7 instruments encompass 52 disparate symptoms. Mean overlap among all scales is low (0.36), mean overlap of each scale with all others ranges from 0.27 to 0.40, overlap among individual scales from 0.26 to 0.61. Symptoms feature across a mean of 3 scales, 40% of the symptoms appear in only a single scale, 12% across all instruments. Scales differ regarding their rates of idiosyncratic symptoms (0–33%) and compound symptoms (22–90%).

Limitations

Future studies analyzing more and different scales will be required to obtain a better estimate of the number of depression symptoms; the present content analysis was carried out conservatively and likely underestimates heterogeneity across the 7 scales.

Conclusion

The substantial heterogeneity of the depressive syndrome and low overlap among scales may lead to research results idiosyncratic to particular scales used, posing a threat to the replicability and generalizability of depression research. Implications and future research opportunities are discussed.

Introduction

The appearance of yet another rating scale for measuring symptoms of mental disorder may seem unnecessary, since there are so many already in existence and many of them have been extensively used.” (Hamilton, 1960).

Major Depressive Disorder (MDD) is among the most common mental disorders (Kessler et al., 2003), and studied in various disciplines ranging from the social sciences to genetics. Depression severity is studied so pervasively – to enroll study participants or track treatment efficacy, as a dependent variable, predictor, covariate, or moderator – that 3 rating scales are among the 100 most cited papers in science (van Noorden et al., 2014): the Hamilton Rating Scale for Depression (HRSD; rank 51) (Hamilton, 1960), the Beck Depression Inventory (BDI; rank 53) (Beck et al., 1961), and the Center of Epidemiological Scales (CES-D; rank 54) (Radloff, 1977).

Interestingly, a great variety of rating scales are used to assess depression severity; Santor et al. (2006) identified 280 different instruments developed in the last century, of which many are still in use. The routine practice is to conduct research based on one particular scale that is chosen for variable reasons: the scale may be available as a tool in the library of the University, it may be the gold standard in the particular subfield of depression research (such as the HRSD for antidepressant trials), or it may be the local custom of the department or hospital. The rationale for using specific scales – say, the HRSD instead of the CES-D or BDI – is rarely provided in scientific publications, and conclusions are drawn about depression in general, not about depression measured by a particular scale.

The tacit – and untested – assumption underlying this practice is that various depression instruments can be used as interchangeable measurements of depression severity. If this assumption does not hold, results of depression studies may be idiosyncratic to the particular scale used, posing a major challenge to the replicability and generalizability of depression research (Santor et al., 2006, Snaith, 1993). For example, a large clinical trial may establish the efficacy of an antidepressant drug in a particular scale – which could have real implications for patients – although participants may show no clinical improvement on a range of other scales.

A number of reasons speak towards the possibility that rating scales are not interchangeable measures of depression severity. First, studies using multiple depression scales have identified differential scale performance. For instance, common instruments differ markedly in their classification of depressed patients into severity categories (Zimmerman et al., 2012). Second, psychometric analyses have documented that most scales are multidimensional, meaning they assess several constructs (Fried et al., 2016b); these factor structures, however, do not generalize across scales (Shafer, 2006, van Loo et al., 2012). Since scales measure different constructs, using different instruments may lead to different results; this is more likely to be problematic the more severe the heterogeneity of depression symptoms across different rating scales is. Finally, depression is a highly heterogeneous syndrome with many clinical presentations (e.g., Fried and Nesse, 2015a; Olbert et al., 2014) and numerous biological and neuroimaging correlates (e.g., Cassano and Fava, 2002), and individual depression symptoms such as sadness, insomnia, concentration problems or suicidal ideation differ in important properties such as biological markers, risk factors, and impact on impairment of functioning (for a review, see Fried and Nesse, 2015b). Symptoms also seem to respond differentially to antidepressant treatment (Hieronymus et al., 2016, Hieronymus et al., 2015). Overall, this implies that rating scales may only be interchangeable indicators of depression severity inasmuch as their item content overlaps.

If overlap of symptom content among scales is high, interchangeable use of depression instruments may not pose a severe challenge. If overlap is low, however, the routine practice of using one particular scale in depression research may lead to idiosyncratic results and threaten the validity of a very large and important field of research. Given the pronounced heterogeneity of the depressive syndrome that may well be reflected in clinical instruments, the concern that depression instruments vary widely in symptom content is not far-fetched.

The main goal of the present report is thus to quantify the overlap of items among widely used depression rating scales.

Section snippets

Depression rating scales

To estimate the extent to which common rating scales of depression differ in terms of item content, 7 common rating scales for depression were examined: the 21-item BDI-II (Beck et al., 1996; from here on referred to as BDI), the 17-item HRSD, the 20-item CES-D, the 30-item Inventory of Depressive Symptoms (IDS) (Rush et al., 1996), the 16-item Quick Inventory of Depressive Symptoms (QIDS) (Rush et al., 2003), the 10-item Montgomery-Åsberg Depression Rating Scale (MADRS) (Montgomery and Asberg,

Results

The content analysis of 125 items across 7 scales resulted in 52 disparate depression symptoms (Fig. 1).

Symptoms appear in a mean of 3 of the 7 rating scales (mode=1, median=2.5). Of the 52 symptoms, 21 (40%) appear only in one single instrument, whereas 6 (12%) feature across all instruments: sad mood, appetite decrease, fatigue, and the 3 insomnia items early, middle, and late insomnia (cannot fall asleep, wakes up during the night, wakes up in the very early morning). Of these, sad mood is

Discussion

The analyses identified a total of 52 specific disparate depression symptoms in 7 common depression scales. The overall overlap of item content among questionnaires was low: 40% of all symptoms appeared only in a single scale, only 12% across all instruments. These findings imply that the routine practice of using scales as interchangeable measurements of depression severity is problematic and may pose a major threat to the generalizability and replicability of depression research. Given the

Acknowledgements

I would like to extend my sincerest thanks to: Jana Jarecki, for help with Fig. 1; Sophie van der Sluis, for the calculation of sum-score correlations given scale length and inter-item correlation; and Don Robinaugh and Lauren Bylsma, for the very helpful comments on previous versions of this manuscript.

During the preparation of this manuscript, EIF was supported in part by the Research Foundation Flanders (G.0806.13), the Belgian Federal Science Policy within the framework of the

References (42)

  • A.T. Beck et al.

    Comparison of beck depression inventories -IA and -II in psychiatric outpatients

    J. Pers. Assess.

    (1996)
  • A.T. Beck et al.

    An inventory for measuring depression

    Arch. Gen. Psychiatry

    (1961)
  • A.M. Chekroud et al.

    Cross-trial prediction of treatment outcome in depression: a machine learning approach

    Lancet Psychiatry

    (2016)
  • J.D. Evans

    Straightforward Statistics for the Behavioral Sciences

    (1996)
  • E.I. Fried et al.

    The impact of individual depressive symptoms on impairment of psychosocial functioning

    PLoS One

    (2014)
  • E.I. Fried et al.

    Depression sum-scores don’t add up: why analyzing specific Depression symptoms is essential

    BMC Med.

    (2015)
  • Fried, E.I., van Borkulo, C.D., Epskamp, S., Schoevers, R.A., Tuerlinckx, F., Borsboom, D., 2016b. Measuring Depression...
  • Hagen, E.H., 2011. Evolutionary theories of depression: a critical...
  • M. Hamilton

    A rating scale for depression

    J. Neurol. Neurosurg. Psychiatry

    (1960)
  • G. Hasler et al.

    Discovering endophenotypes for major depression

    Neuropsychopharmacology

    (2004)
  • F. Hieronymus et al.

    Consistent superiority of selective serotonin reuptake inhibitors over placebo in reducing depressed mood in patients with major depression

    Mol. Psychiatry

    (2015)
  • Cited by (328)

    View all citing articles on Scopus
    View full text