Article Text

Download PDFPDF

Prevalence of emotional and behavioural disorders in German children and adolescents: a meta-analysis
  1. Claus Barkmann,
  2. Michael Schulte-Markwort
  1. Department of Child and Adolescent Psychosomatics, Center of Gynecology, Obstetrics and Paediatrics, University Hospital Hamburg, Hamburg, Germany
  1. Correspondence to Professor Dr Claus Barkmann, Center of Gynecology, Obstetrics and Paediatrics, Department of Child and Adolescent Psychosomatics, University Hospital Hamburg-Eppendorf, Martinistrasse 52, D-20246 Hamburg, Germany; barkmann{at}


Background This meta-analysis aimed to determine the overall prevalence of emotional and behavioural disorders among children and adolescents in Germany, the dependence of prevalence estimates upon the methods employed and potential secular trends.

Methods Primary studies were subjected to meta-analytical analyses using a random effects model. Mean estimates of primary study effects were averaged using the precision-weighted method and were subsequently subjected to sensitivity analyses using hierarchical regression and (co-)variance analyses.

Results The precision-weighted average primary study prevalence for the 33 studies included was M=17.6%. The effect size primarily depended on the case definition employed, with studies applying questionnaire criteria showing, on average, lower primary study effects. Moreover, a negative relationship was found between study validity and primary study effect.

Conclusion Half a century of research efforts indicate that approximately every sixth child shows signs of emotional or behavioural disorders, and conclusions regarding period effects are not robust.

  • Child health
  • childhood and adolescence
  • emotional and behavioural disorders
  • epidemiology
  • meta-analysis
  • prevalence

Statistics from

Scientific studies on the type, extent and distribution of emotional and behavioural disorders in childhood and adolescence primarily serve the purpose of establishing prevalence rates, identifying trends over time and generating aetiological hypotheses.1 The findings provide the public with information on the status of health among children and adolescents, and aid experts in the development of concepts of psychopathology and research methods. They further serve as a basis for the planning, description and evaluation of mental healthcare facilities.2

Determining the frequency of emotional and behavioural disorders during childhood and adolescence is inextricably linked with the problem of defining disorders. According to Remschmidt,3 these refer to ‘… a state of arbitrarily disturbed life functions, which exhibits a temporal dimension though its onset, course, and as the case may be also its termination and which drastically hinders a child or adolescent from actively taking part in and coping with aspects of life which are typical for his/her age group’ (p 146). This general conceptual definition of emotional and behavioural disorders in childhood and adolescence is empirically operationalised using two different approaches, namely clinical and statistical taxonomies. A major characteristic of clinical taxonomies is the categorial conception of mental diseases. Based on lists of symptoms as well as criteria relating to time and intensity, conclusions are drawn regarding whether a disorder is evident or not. Among current classification systems, chapter F of the International Classification of Diseases (ICD), version 10 (WHO) and the US American Diagnostic and Statistical Manual of Mental Disorders (DSM), version IV (American Psychiatric Association) are of primary significance.4–6 Statistical classification systems are based on the psychometric approach in psychology, adopt a dimensional conception of characteristics, and use questionnaires to assess the number or intensity of symptoms, syndromes, or overall problems (eg, child behaviour checklist by Achenbach and Rescorla).7 By the aid of factor analyses, so called ‘scales’ are derived that can be transformed into a categorial case definition by cut-off scores. These in turn are usually validated based on clinical samples diagnosed according to the DSM or ICD taxonomies.

In the study presented here, emotional and behavioural disorders are classified in accordance with conventional use as clinical psychiatric syndromes according to axis I of the ICD-based multiaxial classification scheme for mental disorders in childhood and adolescence.8 These include, for example, attention-deficit hyperactivity disorders, conduct disorders, anxiety, depression, obsessive-compulsive disorders, somatoform disorders and eating disorders, but not disorders of psychological development or mental retardation.

Populations utilising child and adolescent psychiatric services do not provide reliable prevalence data due to the fact that they are systematically biased. A large proportion of children and adolescents in need of treatment do not make use of corresponding service providers.9 10 Existing routine evaluations, such as population censuses, medical checkups in schools, or health fund data, also fail to provide valid data because these measures are performed with a different purpose in mind and the identification of emotional and behavioural disorders demands a certain level of expertise. Measuring the ‘true’ prevalence of emotional and behavioural disorders in children and adolescents requires investigations of representative field samples. This, however, gives rise to a series of problems, including, for example, issues concerning case definition, systematic sample loss and interrater differences.1 11–14 Worldwide, well over 100 studies are to be found on the topic of the overall prevalence of emotional and behavioural disorders in children and adolescents. Investigations that have received the greatest international response include, for example, the Isle of Wight studies, UK,15 the Kauai study, USA,16 the Dunedin study, New Zealand,17 the Puerto Rico study, USA,18 the Methodology for Epidemiology of Mental Disorders in Children and Adolescents (MECA) study, USA,19 the Great Smoky Mountain Study of Youth, USA20 and the British Mental Health Survey, UK.21 Meanwhile, a large number of reviews have been published.1 9 13 14 22–27

In the most extensive review to date, Roberts et al28 evaluated the total stock (dating back to 1950) of 52 research studies identified in their search of the literature and further pinpointed trends and drafted strategies for future studies. The review comprises studies from more than 20 different countries from the continents of Africa, Asia, Europe, and North and South America. The majority of the studies were conducted in the USA and England (13 and six studies, respectively). The samples investigated range from 58 to 8462 cases (Mean=1201, median 831). A one-stage design was employed in 33 studies and a two-stage design in 19 studies. One-stage designs calculated an average prevalence of 15.0% and two-stage designs of 17.5%. Overall, the average prevalence amounted to 15.8% (median 13.7%, modus 12.0%). Overall prevalence rates ranged from 1% to 51%. Age group-specific analyses showed that overall prevalence rates increased with age (with a mean prevalence rate of 10.2% for preschool children up to 16.5% for adolescents). Prevalence rates further depended on the measurement instruments and case criteria employed. Using Rutter's instruments,29 prevalences of approximately 12% were found, whereas the schedule for affective disorders and schizophrenia for school-age children (K-SADS) resulted in rates of approximately 14% and the diagnostic interview schedule for children (DISC) in rates of between 20% and 25%. When controlling for further differences, there were no major differences in mean prevalence rates across the decades. Reviews published at a later date1 24 26 have largely confirmed the results of Roberts et al.28

In all international reviews, no more than three studies of German children and adolescents were included.30–32 Some German-language publications have included slightly more, with up to eight studies being to some extent rather unsystematically collated and qualitatively evaluated.33–36 However, the fact that this does not exhaustively reflect the current state of research in German-speaking countries has been demonstrated in a systematic compilation of primary studies undertaken in Germany.37 According to that review, more than 20 studies had been conducted by the onset of the new millennium. Descriptive analyses of those studies revealed an average overall prevalence of Mean=17.8% (SD 5.54, minimum 10.3%, maximum 29.9%). Particularly noteworthy primary studies have been conducted by Remschmidt and Walter,35 who introduced the child behaviour checklist (CBCL) and the youth self-report (YSR) into the German-speaking world; by Esser et al,38 who performed the first longitudinal study on the prevalence and incidence of emotional and behavioural disorders in children and adolescents; and by Doepfner et al,39 who were responsible for the first nationwide representative study.

The international reviews published so far have not been able to determine whether individual countries with their different cultural and political foundations show varying prevalences. If such internationally comparative studies fail to identify and include a substantial proportion of primary studies due to international differences in publication practices, then this question can either not be analysed at all or only against the backdrop of a publication bias. The example of Germany is particularly interesting because, as a member of the European Union, it represents a highly industrialised country with an internally renowned health system, which is based on the principle of national insurance and which is among the most expensive in the world.40 Whereas the mental health of children and adolescents was previously of subordinate significance compared with that of adults in that country, a change in thinking is beginning to take place due to the increasing threat posed by the persistent decline in birth rates to the national insurance system.41 Whereas national surveys have been conducted in the USA, UK and other countries since the 1980s, the first national health survey on child and adolescent health including a module on mental health has only recently been conducted in Germany (the German Health Interview and Examination Survey for Children and Adolescents; KiGGS).42 The present paper thus examines the current state of empirical research, taking into account the following questions:

  1. How high is the overall prevalence of emotional and behavioural disorders among children and adolescents in Germany?

  2. How sensitive is this prevalence to variation in primary study effects as well as methodological and validity characteristics of the primary studies?

  3. Is a secular trend observable?

Materials and methods

The research questions were addressed using a meta-analytical approach. The selected procedures are primarily based on the guidelines of the Cochrane Collaboration,43 the handbook by Cooper and Hedges,44 and the recommendations of Stroup et al45 and Thompson and Higgins.46 Relevant primary studies were those empirical investigations that determined an overall prevalence rate for emotional and behavioural disorders in non-preselected, German-speaking community samples of children and adolescents up to the age of 18 years. The smallest common denominator of all relevant studies is thus that identified cases showed a clinically indicated need for child and adolescent psychiatric diagnostics, counselling, and/or treatment. Only studies that determined an overall prevalence with the aid of clinically validated case definitions were included. Primary studies with a case definition based purely on statistics were excluded. That means, for example, that when questionnaire criteria were used as a case definition, the age and gender-specific raw test score was used, which optimally differentiated between clinical and non-clinical cases in the corresponding validation study. Although such a critical raw score in the validation sample of the instrument concerned may correspond to a striking percentile boundary such as the 90th percentile, it can lead to completely different prevalence rates than 10% when used for other samples.

Methodological quality did not serve as a selection criterion because: (1) the heterogeneity of irrelevant characteristics in the primary studies ensures external validity of the meta-analytical results, because large non-systematic differences in the primary studies increase the probability that future primary studies will show the same results47; (2) the mutual counterbalancing of potentially biased results increases internal validity, which, in the context of sensitivity analyses, can be estimated based on the correlation of primary study effects and validity ratings48; (3) the inclusion of diverse operational definitions of mental disturbance and disorder enhances construct validity; and (4) the enlargement of the study sample results in a narrowing of CI for parameter estimates and a decrease in type II errors for statistical tests.49 50

Selected databases included EMBASE, ERIC, FORIS, Medline, PSYNDEX, PsycINFO, SCI and SSCI. Search terms and applied procedures were iteratively adapted according to the respective database. The initial search terms ‘epidemiology’, ‘behaviour problems’, ‘mental disorders’, ‘mental health survey’, ‘child behaviour checklist’ in combination with ‘children’ and ‘adolescents’) and applied procedures were iteratively adapted according to the respective database. Expert surveys, local library stocks and specialist conferences, as well as state, public and private documentation and research centres, were also drawn upon as information sources.44 Of 194 articles screened by abstracts, 66 showed potential relevance and thus were reviewed in full. Of these, 32 had to be excluded because of preselected samples (k=3, eg, children and adolescents in medical treatment), only statistically based case definitions (k=5; eg, the upper 10%), or no case ascertaiment at all (k=24). As a result, a total of k=34 primary studies was identified (see table 1; none from Austria, one from Switzerland, excluded). With regard to the year of publication, the number of studies conducted showed a positive tendency, increasing from two studies in the 1950s to 10 studies since the turn of the millennium. Across these studies, a total of n=72 978 children and adolescents had been investigated using methods that varied according to the state of knowledge at the time of investigation (for study characteristics see table 2).

Table 1

Description of the k=34 primary studies

Table 2

Predictive value of binary-coded methodological features for primary study prevalence

The methodological characteristics of the primary studies were operationalised according to four dimensions (general aspects, design, sample and case definition) and included important variables such as the year of publication, sample size, or source of information (table 2). This was designed to ensure that included primary studies were describable and that correlations between methodological features and effect sizes were analysable. Polytomous non-metric variables were dummy-coded to ensure statistical manageability. For the evaluation of study quality, Cook and Campbell's51 ‘threats to validity approach’, which comprises the four categories of statistical, construct, internal and external validity, was adjusted to prevalence studies. Those quality criteria that do not apply to prevalence studies (eg, regression to the mean or treatment diffusion) were eliminated from this list. In addition, two criteria that are of particular significance in this field of study (case definition and source of information) were added.

All coding was performed by two independent raters with previous experience in conducting prevalence studies. As much as possible, coding anchors were based on established conventions. Besides a 0–1 coding (not given/given, see table 3), global ratings on a seven-point scale (not at all to very good) were made. Following training, interrater reliability with respect to the methodological characteristics of the studies had a median value intercorrelation coefficient of 1.00 across all quantitatively coded variables and a value of κ=0.94 across all qualitatively coded variables. Across all four rating dimensions, average study quality amounted to Mean=0.74 (0, invalid/1, valid; SD 0.267). Factors with a negative influence on validity, which occurred particularly frequently, included small sample size and low representativeness of the sample, as well as missing CI calculations (table 3).

Table 3

Predictive value of validity criteria for primary study prevalence

While the data structures tended to be rather heterogeneous, it was possible to reduce methodological features roughly by explorative factor analyses and hierarchical cluster analyses to four scales (general framework, design, samples and case definition) and a two and six-cluster solution (not presented here). However, for validity ratings only, a two-cluster solution was identifiable (low vs high validity). The effect description and homogeneity test were performed using the random effects model (with the exclusion of the lifetime prevalence study of Essau et al).30 Estimation of a significant variance component, τ2, in line with Hedges and Olkin52 was conducted using the simplified procedure developed by Shadish and Haddock.53 Accordingly, subtraction of the mean variance of the individual effect sizes from the total variance of the unweighted effect sizes resulted in a value of τ2=29.29 (where SD2=32.15, mean variance of primary study effects is 94.30/33=2.86). Instead of applying simple primary study variances, all analyses were based on effect variances that were adjusted by adding this τ2 component to the primary study variance.54 Integration was performed using study-specific precision weights, computed as the reciprocal of the squared SE.55 Furthermore, the homogeneity test Q and the explained variance coefficient I2 were computed.56 57

The analysis of the sensitivity of the prevalence estimate to the methodological characteristics and validity restrictions of the primary studies was (in line with the general linear model) performed using precision-weighted hierarchical linear regression and (co)variance analyses. For significant predictors, validity-adjusted prevalences were determined. For the examination of a potential secular trend, the gradient of the regression of primary study prevalence on year of investigation was computed, tested and controlled for method and validity dependencies. Statistical testing was local, two-sided, and performed at the 5% significance level (with an exception for the homogeneity test, with 10%). CI for effect sizes were computed using the adjusted Wald method.57 Effect sizes were interpreted based on the conventions established by Cohen.58 Analyses were conducted using SPSS 16.0, Excel 2003, and D-Stat 1.1 software.


Integration of effects

Across the k=33 studies included (n=72 978 children and adolescents), the precision-weighted average primary study prevalence amounted to M=17.58% (SE 0.983PP, 95% CI 15.66 to 19.51). The test of homogeneity (χ2=32.143, df=32 and p=0.460) showed that prevalences can be arithmetically averaged. At the 10% level of significance, the a-posteriori-determined power of the χ2 test amounted to 1−β=99.9%. According to the formula by Higgins et al,56 the proportion of prevalence variance explained by the heterogeneity of the individual population effects was I2=0.4%. Figure 1 depicts the distribution of mean primary study effects around the precision-weighted average effect size (Kolmogorov–Smirnov test of normality with Lilliefors correction: Z=0.142, df=33, p=0.092).

Figure 1

Pooled primary study effects (including 95% CI and regression line, k=33).

Sensitivity analyses

Sensitivity to variation in primary study effects

Normalising the distribution by a logit transformation resulted in a precision-weighted average of M=16.94% (SE 0.983PP, 95% CI 15.02 to 18.87). As almost half of the included primary studies (48.5%, k=16) had determined more than one overall prevalence, it was possible to calculate the distributions of minimal and maximal prevalences separately. The precision-weighted pooled effect size for the smallest reported prevalences amounted to M=14.78% (SE 0.979PP, 95% CI 12.87 to 16.70), and for the largest, to M=20.13% (SE 0.985PP, 95% CI 18.20 to 22.06). To gain information concerning unusual effects, an outlier analysis was performed based on the corrected individual unconditional effect variances.52 Two of the k=33 primary studies showed values that significantly differed from zero (≤−1.96 or ≥+1.96): Kraenz et al59 (p=29.9%, z=2.06, p≤0.020) and Kuschel et al60 (p=31.5%, z=2.42, p≤0.008). Exclusion of these studies resulted in M=16.77% (SE 1.013PP, 95% CI 14.79 to 18.76).

Sensitivity to methodological characteristics of the primary studies

First, bivariate associations between methodological characteristics and the prevalences were explored using weighted linear regression analysis (table 2). Only ‘northern Germany’ showed a significant effect, whereas ‘questionnaire criteria as case definition’ was tendentially significant. In a second step with ‘northern Germany’ maintained as a predictor, only ‘questionnaire criteria as case definition’ reached statistical significance (18.56+5.07 * northern Germany–4.51 * questionnaire criteria as case definition; SE 2.04, 95% CI 1.07 to 9.06; z=2.487, p=0.013, and SE 2.11, 95% CI −8.65 to −0.37; z=−2.135, p=0.033). Furthermore, the previously computed method scales and clusters were separately and jointly entered as predictors of prevalence in the weighted linear regression and used as factors in covariance analyses, but without reaching the level of significance. Finally, category-specific prevalence distribution values were determined:

  • −The weighted mean prevalence for the k=13 studies from northern Germany was M=20.18 (SE 1.555PP, 95% CI 17.13 to 23.23%); it was M=15.86 (SE 1.269PP, 95% CI 13.37 to 18.35%) for the k=20 studies from southern Germany. More detailed analyses revealed that this effect was primarily due to the two north German studies conducted by Kraenz et al59 (Braunschweig) and Kuschel et al60 (Rostock). After adjusting for these, the predictive value of the variable ‘northern Germany’ was no longer significant.

  • −The weighted mean prevalence for the k=22 studies with questionnaire criteria as case definition was M=16.37 (SE 1.206PP, 95% CI 14.01 to 18.73); it was M=19.98 (SE 1.697PP, 95% CI 16.65 to 23.31) for the k=11 studies without questionnaire criteria as case definition. More detailed analyses revealed that this effect was not simply due to the two outlier studies. After adjusting for these, the predictive value remained significant and the mean difference even slightly increased.

Sensitivity to validity restrictions of the primary studies

Associations between binary validity criteria and prevalences were explored by weighted linear regression analysis (table 3). The aspects ‘standardisation’, ‘sample size’, ‘multidimensionality’ and ‘drop-out’ reached the level of significance. In a second step, ‘standardisation’ was maintained as a predictor, and all 15 remaining methodological features were successively entered into the regression equation. ‘Sample size’ as an additional predictor explained the greatest share of variance (26.00–7.40 * standardisation–4.39 * sample size; SE 3.38, 95% CI −14.03 to −0.78; z=2.192, p=0.028 and SE 2.02, 95% CI −8.35 to −0.44; z=2.176, p=0.030). The seven-point validity ratings and the two validity clusters were analysed in an analogous manner. All seven-point validity ratings showed highly significant effects, whereas the cluster solution proved insignificant (not presented here). Hierarchically modelled predictor combinations did not explain incremental variance; this also applied to covariance analytical combinations of binary and metric validity criteria.

Again, prevalence distribution values were computed for specific categories of validity criteria. The results revealed that, without exception, mean values reported in higher-quality primary studies were lower than those found in more invalid studies: M=20.56 (95% CI 17.88 to 23.24) for low and M=14.39 (95% CI 11.62 to 17.16) for high statistical validity; M=22.80 (95% CI 19.60 to 26.00) for low and M=14.61 (95% CI 12.20 to 17.02) for high construct validity; M=19.99 (95% CI 17.39 to 22.59) for low and M=14.63 (95% CI 11.75 to 17.51) for high content validity; M=17.96 (95% CI 1.031 to 15.94) for low and M=13.88 (95% CI 7.51 to 20.25) for high external validity; M=19.35 (95% CI 17.04 to 21.66) for low and M=13.59 (95% CI 10.11 to 17.07) for high total validity. Finally, validity-weighted mean prevalences were computed for the five seven-point validity ratings: M=16.16 (95% CI 14.24 to 18.09) for statistical validity; M=16.65 (95% CI 14.72 to 18.57) for construct validity; M=16.76 (95% CI 14.84 to 18.69) for content validity; and M=16.37 (95% CI=14.84 to 18.69) for external validity. The average prevalence weighted by overall validity was 16.22% (SE 0.983; 95% CI=14.31 to 18.16).

Publication bias and fail-safe N

The funnel plot in figure 2 illustrates the spread of primary study effects according to sample size (r=−0.06). With three exceptions (56, n=4363; 57, k=9704; 58, n=27054), the points are relatively symmetrically gathered around the mean prevalence. However, deviation from a symmetric distribution around the mean prevalence is evident in terms of a slight skew to the right. The correlation between the rank order of observed primary study effects and their variances was r=0.21 (Kendall's τ; p=0.094), and thus a publication bias cannot be ruled out.61

Figure 2

Funnel plot of primary study prevalences (exponential scaling of y axis, k=33).

To estimate the extent of a potential publication bias, the fail-safe number was calculated in accordance with Orwin.62 For the k=33 included studies with a mean prevalence of 17.6%, four models were computed based upon rounded-off minimum and maximum values from the existing prevalence distribution (10% and 32%) as well as assumed reduced and elevated means of 15% and 20%, respectively (not presented). If, for example, an average primary study prevalence of 10% is assumed, more than 100 unpublished primary studies with a prevalence of 7.5% would have to be integrated. For an average prevalence of 20%, more than 31 unpublished studies with a prevalence of 22.5% would have to be integrated.

Secular trend

Figure 3 illustrates the historical course of reported epidemiological study results since the establishment of the Federal Republic of Germany in 1949, under consideration of the respective 95% CI. The weighted linear regression analysis across all k=33 studies resulted in a slope of b=−0.06 percentage points per year (SE 0.07, 95% CI −0.19 to 0.08; z=0.836, p=0.403). Exploration of different models showed that more complex functions did not explain any additional variance (not presented here). However, the trend graphic revealed two extreme values that disproportionately influenced the trend statistic.59 60 Excluding these resulted in a model with tendential significance and superior curve fit (b=−0.12, SE 0.07, 95% CI −0.26 to 0.02; z=−1.648, p=0.099). Exclusion of the first two studies in the 1950s (as temporal outliers) resulted in a slope of b=0.02 (SE 0.09, 95% CI −0.15 to 0.20; z=0.276, p=0.783).

Figure 3

Time course of prevalences obtained in the k=33 primary studies (including 95% CI and regression line).

To analyse the potential influence of methodological features on this relationship, explorative hierarchical regression analyses were performed (not presented here), drawing upon the method and validity scales and clusters established. The results revealed differential changes in the methodological characteristics of the primary studies over the decades; while the general framework showed substantial and sample quality slight improvement, the quality of design and case definition was found to decrease. In contrast, more recent studies showed improvement with respect to all global aspects of validity compared with older studies.


The pooled overall prevalence of 17.6% for emotional and behavioural disorders in children and adolescents in Germany lies within the range reported in international reviews (see introduction) and is close to the result reported by Roberts et al.28 Accordingly, not every fifth, as frequently claimed, but every sixth child is affected. This means that of the total of 13.43 million 3–18-year-olds in Germany, 2.36 million show a need for clinically indicated diagnostics, counselling, and/or treatment. 63 The SE of one percentage point leads to uncertainty with respect to approximately 258 000 children and adolescents. This does not, however, mean that each clinically relevant case had a disorder requiring treatment. It is generally assumed that half require treatment, whereas the other half is adequately provided for by extensive diagnostics and counselling.9 64

As the mean primary study prevalences can be considered homogeneous, a general integration model with calculation of a singular mean effect is appropriate. Studies with varying case definitions and, above all, with dimensional and categorical diagnostic approaches can thus be integrated. This finding is in contrast with the presuppositions of authors of a number of reviews (see introduction) that restrict integration to primary studies with a classificatory approach. When evaluating such reviews, it is necessary to consider that a number of the disorders in DSM and ICD (as well as the interviews used for assessment) have either not been evaluated or have been evaluated only inadequately with respect to their validity in child and adolescent age groups (‘The ICD-10 and DSM-VI diagnostic classifications for children and adolescents are woefully inadequate and of limited applicability in global epidemiological studies.’ p 226).11 Ultimately, the 17.6% thus represents the average proportion of German children and adolescents between the ages of 1.5 and 18 years, as examined and defined as showing clinical symptoms in more than 30 scientific studies that were conducted independently of one another and using the various psychiatric and clinical-psychological methods that were generally established and optimal in terms of the current state of research at the respective time of their application.

The use of transformed mean primary study prevalences and the exclusion of outlying studies before integration do not result in significant deviations from the mean estimated integrated effect size. It is only when minimum and maximum primary study prevalences are separately integrated that significant differences of M=14.8% versus M=20.1% arise. The only methodological feature identified as a significant influence was the type of case definition, with questionnaire studies resulting in a lower mean effect. The reason for this becomes apparent when examining the original four-level variable (respondent's judgement, investigator's judgement, diagnostic criteria, questionnaire criteria). Across all included primary studies, the studies with questionnaire criteria showed a considerably greater dispersion and lower central tendency of the effect distribution compared with the remaining groups.

The present sample includes only a small number of primary studies that used the DSM and ICD taxonomies as case criteria. However, carrying out the corresponding parent and child interviews is very labour and time-intensive, and particularly in the first decades of the study period, none of the required structured interview guides were available in German. However, the relevant primary studies showed no significantly different prevalence rates than studies with other case definitions (see also table 3, line 32). Nor did an individual comparison of the weighted mean prevalence estimates of studies with DSM/ICD criteria and those with questionnaire criteria as a case definition reveal any significant difference (M=16.37 (95% CI 14.01 to 18.73) and M=17.65 (percentage points, 95% CI 11.91 to 23.38)). This may be due primarily to the fact that most of the questionnaire criteria used were in turn developed and validated using DSM and ICD criteria (eg, in the CBCL).7 The intranationally occurring dependency of study results on the methods employed is also confirmed at the international level.1 Given that large-scale, representative samples are drawn upon, prevalence is predominantly determined by case criteria and the corresponding methods of assessment. Prevalence reduction results when severity or the degree of impairment caused by the symptoms is applied as an additional criterion.

In comparison with methodological features, validity ratings showed a greater quantity of clearer and more consistent associations with primary study effects. The existence of validity restrictions thus tends to result in prevalence enhancement, rather than prevalence reduction (see table 3). Here, the two most relevant single aspects were sample size and study standardisation. Due to suboptimal average study validity, the validity-weighted average primary study prevalence was approximately 1.4 percentage points lower than the unweighted prevalence. In the meta-analytical literature, it is assumed that validity weighting generally does not lead to an effect bias.49 65 However, the sample effect observed here is plausible when assuming that small samples lead to selection effects and a low level of standardisation to results that tend to be consistent with hypotheses.

Supported by sensational media reports of adolescent spree killers, kidnappers and suicide victims, the assumption of increasing mental health disorders among children and adolescents in Germany is widely found.66 67 Based on knowledge to date, however, such a conclusive statement is not possible due to the fact that the required comparable investigations have thus far not been conducted. As shown by the analyses, prevalence depends on study characteristics, which in turn are influenced by the state of epidemiological methods at the time of investigation. Varying results are thus not exclusively due to actual differences in prevalence, but rather to design effects. At an international level, an empirically based statement concerning period effects is also not possible due to the heterogeneity of the methods employed.28 68 Here, the additional and thus far unresolved problem regarding intercultural comparability of study conditions also plays a role.

The following methodological limitations warrant particular emphasis:

  • −The three classic points of criticism regarding meta-analyses (garbage in—garbage out, comparing apples and oranges, subjectivity) were in particular taken into account by adopting an established approach for study coding and conducting extensive sensitivity analyses.43

  • −The inclusionistic and explorative approach combines the aim of point and interval estimation with that of explaining variance. While a correspondingly broad research question reduces the precision of the results obtained, it also increases their potential generalisability.69

  • −In the context of explorative sensitivity analyses, the high number of post-hoc analyses is admissible.46 Nonetheless, the limited number of primary studies and in part unfavourable number of cell counts only allows conclusions of limited robustness. It was in particular not possible to integrate age, sex and social class effects because these were only rarely reported.

  • −All identified primary studies were successfully retrieved and included. However, the funnel plot indicated that a publication bias relating to an increased likelihood of publication of studies with increased prevalences cannot be completely ruled out.

In summary, half a century of German research efforts lead to the conclusion that approximately every sixth child shows signs of emotional or behavioural disorders, and conclusions regarding period effects are not robust. Future research endeavours must adhere to international epidemiological standards with respect to case criteria and instruments as well as study design and implementation in order to guarantee inter and intranational comparability of results.1 22 70 Guidelines for the standardisation of studies estimating the total prevalence of mental problems in children and adolescents should be developed. Among other things, this implies the implementation of at least two assessment levels. The first level should comprise a comprehensive screening of various rater perspectives and the second level a comprehensive diagnostic module for diagnoses according to current ICD and DSM criteria in addition to a severity rating. In order to capture prevalence shifts, incidences and other course parameters, regular follow-up investigations should be scheduled (surveillance). Internationally widespread screening instruments and possibilities for standardised diagnosis should be subject to further development. While questionnaires are already internationally relatively well harmonised, there is a clear backlog with respect to standardised clinical interviews for the assessment of ICD or DSM diagnoses. Furthermore, the various case definition strategies must be systematically collated and comparatively tested using a single set of data. Here, various criteria levels (counselling, diagnostics, treatment), age and sex groups, and rater perspectives must be taken into account.

Due to the high demand for care, the development and optimisation of care facilities, the qualification and training of experts, the promotion of individual, family-based and social resources, as well as a general societal change of behaviour in dealing with those affected become the most central tasks with which future healthcare policymakers will be faced. Results of the WHO Global Burden of Diseases Study underscore the pivotal relevance of emotional and behavioural disorders for social and health systems.71 This also, and particularly, applies to childhood and adolescence.

What is already known on this subject

  • According to the WHO, mental, behavioural and developmental disorders with childhood onset are a major public health concern.70

  • The USA, UK and other countries began extensive national efforts to evaluate the overall prevalence of emotional and behavioural disorders in children and adolescents in the early 1980s.

  • There has been no comprehensive and systematic endeavour within German-speaking countries so far.

What this study adds

  • The pooled overall prevalence of 17.6% for emotional and behavioural disorders in children and adolescents in Germany lies within the range reported in international reviews.

  • The only methodological feature identified as a significant influence was the type of case definition, with questionnaire studies resulting in a lower mean effect.

  • Due to the high demand for care, the development and optimisation of care facilities become the most central tasks with which future healthcare policymakers will be faced.


View Abstract


  • Competing interests None.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.