Reliability and validity of the short form of the child health questionnaire for parents (CHQ-PF28) in large random school based and general population samples
- Hein Raat1,
- Anita M Botterweck2,
- Jeanne M Landgraf3,
- W Christina Hoogeveen4,
- Marie-Louise Essink-Bot1
- 1Department of Public Health, Erasmus MC, University Medical Centre Rotterdam, Netherlands
- 2Statistics Netherlands, Voorburg/Heerlen, Netherlands
- 3HealthAct, Boston, USA
- 4Department of Youth, GGD, Municipal Health Service Rotterdam, Netherlands
- Correspondence to: Dr H Raat Department of Public Health, Erasmus MC, University Medical Centre Rotterdam, PO Box 1738, 3000 DR Rotterdam, Netherlands;
- Accepted 13 May 2004
Study objectives: This study assessed the feasibility, reliability, and validity of the 28 item short child health questionnaire parent form (CHQ-PF28) containing the same 13 scales, but only a subset of the items in the widely used 50 item CHQ-PF50.
Design: Questionnaires were sent to a random regional sample of 2040 parents of schoolchildren (4–13 years); in a random subgroup test-retest reliability was assessed (n = 234). Additionally, the study assessed CHQ-PF28 score distributions and internal consistencies in a nationwide general population sample of (parents of) children aged 4–11 (n = 2474) from Statistics Netherlands.
Main results: Response was 70%. In the school and general population samples seven scales showed ceiling effects. Both CHQ summary measures and one multi-item scale showed adequate internal consistency in both samples (Cronbach’s α>0.70). One summary measure and one scale showed excellent test-retest reliability (intraclass correlation coefficient >0.70); seven scales showed moderate test-retest reliability (intraclass correlation coefficient 0.50–0.70). The CHQ could discriminate between a subgroup with no parent reported chronic conditions (n = 954) and subgroups with asthma (n = 134), frequent headaches (n = 42), and with problems with hearing (n = 38) (Cohen’s effect sizes 0.12–0.92; p<0.05 for 39 of 42 comparisons).
Conclusions: This study showed that the CHQ-PF28 resulted in score distributions, and discriminative validity that are comparable to its longer counterpart, but that the internal consistency of most individual scales was low. In community health applications, the CHQ-PF28 may be an acceptable alternative for the longer CHQ-PF50 if the summary measures suffice and reliable estimates of each separate CHQ scale are not required.
- CHQ-PF28, child health questionnaire parent form 28 items
- CHQ-CF87, child health questionnaire child form 87 items
- VAS, visual analogue scale
Reliable and validated generic health status measures are available to describe health and health related quality of life of children in evaluation studies of community health and clinical interventions,1–4 burden of disease studies,5,6 or in community health and clinical practice.7,8 These measures are applied in addition to condition specific health related quality of life measures and clinical measures, and therefore should be as short as possible without losing precision and reliability.9 This study is the first to evaluate the reliability, including test-retest reliability and validity of the shortest version of the widely used child health questionnaire (CHQ), the 28 item parent form (CHQ-PF28).10–13
The CHQ was developed in the USA, and has since been cross culturally validated into 21 languages (32 countries).10–20 The CHQ uses the same structure and methodological approach as the SF-36,21 but was developed specifically for children and therefore includes scales that consider the effects of the child’s health on family functioning, as well as specific scales, such as behaviour and self esteem (table 1).10–12 For adolescents, a self report CHQ child form is available (CHQ-CF87).11,15,20,22 A CHQ form for pre-school children is in the process of development.23 The 50 item parent form (CHQ-PF50) for school age children of about 4 or 5 years and older, is the most frequently applied version of the CHQ.12,14 As young children up to around the age of 10 are considered unable to rate their own health consistently,1–3 parents generally are used as surrogate responders. As reported in the CHQ user’s manual,11,23 in accordance with general guidelines,9 regression techniques and item scaling analysis have been applied to derive a shortened CHQ-PF28 from the CHQ-PF50, which contains the same scales, but only a subset of the items that make up the CHQ-PF50. The objectives of this study were to assess, in a large random school based population:
the feasibility of the child health questionnaire (CHQ-PF28) as a proxy measurement of child health and health related quality of life (indicators: response rate, missing/non-unique answers, presence of floor and ceiling effects);
the reliability of the CHQ-PF28 scales (internal consistency and test-retest reliability);
the validity of the CHQ-PF28 as judged by comparisons with 0–100 visual analogue scale (VAS) ratings of the child’s health (concurrent validity), and the ability to discriminate between groups with and without specific self reported chronic conditions (discriminative validity).
Additionally, the presence of floor and ceiling effects and the internal consistency of scales were evaluated in a dataset from a nationwide general population sample from Statistics Netherlands (see Methods).24 We compared the current results for the CHQ-PF28 with earlier findings on CHQ evaluations.11–20
Study population and data collection
In 2001, parents of a random sample of 2040 children (aged 4–13 years) attending any one of 28 elementary schools in two middle sized Dutch cities (Krimpen and Ridderkerk, chosen for practical reasons) were mailed a questionnaire. The 2040 children were selected from a computerised database with all clients (school children) of the municipal health service by applying computer generated random numbers; their parents were invited to participate by the medical director of the municipal health service. The parents themselves decided if either the father or the mother should complete the questionnaire. Up to two reminders were sent; no incentives applied. After three weeks, a random subgroup of 480 parents of these 2040 children who had returned the first questionnaire were mailed the same questionnaire again to assess test-retest reliability.
The CHQ adaptation into Dutch that included three independent forward and two backward translations was made following international guidelines.12,14–16,25,26 CHQ-PF28 items have four, five, or six response options, divided over eight multi-item scales and five single item concepts (table 1). Per scale, the items are summed up (some recoded/recalibrated) and transformed into a 0 (worst possible score) to 100 (best possible score) scale.11 “Physical” and “psychosocial” CHQ summary scores, which are based on a factor analytical model of a US child population sample, were calculated in a manner analogous to the construction of the summary scores in the SF-36; summary scores of 50 represent the mean in the US reference population children; 10 points above/below 50 reflect one standard deviation difference in either direction.11,26
In addition, the questionnaires consisted of items on standard sociodemographic variables and the presence of chronic conditions. Respondents were asked to indicate on a 0–100 VAS, labelled from worst to best imaginable health state, how good or how bad they felt their child’s current health state to be.27,28
We assessed response and missing/non-unique answers for the regional sample of schoolchildren as well as score distributions to evaluate floor and ceiling effects (>25% of the respondents having the lowest/highest score).15 Floor and ceiling effects were also evaluated in subgroups with a chronic condition (see below).
Additionally, the CHQ-PF28 score distributions (and the internal consistency of multi-item scales; see below) were assessed in a dataset from the 2000 and 2001 random nationwide general population samples from Statistics Netherlands.24 These data were collected as health measure regarding children aged 4–11 years as a subgroup in the Statistics Netherlands continuous survey of living conditions that included, on a yearly basis, 10 000 persons from all ages in total. (Please see the addendum (available on line http//:www.jech.com/supplemental)) for a table showing the comparison of mean scores and standard deviations of the CHQ-PF28 in a subgroup of children aged 4–11 years of the regional sample of schoolchildren and the whole nationwide general population sample of Statistics Netherlands.) The randomisation of the general population sample from Statistics Netherlands was conducted in two steps: firstly, municipalities were randomly selected from 40 so called COROP areas (strata) in the Netherlands and from every COROP area one or more municipality/municipalities; secondly, persons were randomly selected from the selected municipalities; the response rate was 57%.24 The data were gathered by parent interviews at home by trained assistants with the use of portable computers for direct data entry: n = 2474; age of the children 4–11 years, mean age 7.5 years (SD 2.3); 49.2% were girls.24
Cronbach’s α was used to evaluate the internal consistency of the CHQ scales in both samples; a Cronbach’s α of a multi-item scale of 0.70 or higher was considered to indicate sufficient internal consistency.29 We assessed whether (on average) correlation coefficients (Pearson r correlation coefficients) between the items and their own scale score (without the item under consideration) were higher than the correlation coefficients between these items and any other scale, to evaluate whether the CHQ items represent separate domains.
Test-retest reliability of the CHQ-PF28 scales was, at the individual level, assessed by test-retest intraclass correlation coefficients (ICCs)30; ICCs of 0.70 or higher were considered to show excellent test-retest reliability and ICCs of 0.50–0.70 to indicate moderate test-retest reliability.6 Additionally, at the group level, test-retest reliability was assessed by two sided Wilcoxon’s signed ranks tests, and by effect sizes: d = [mean(test)−mean(retest))/SD(test]; 0.20⩽d<0.50 indicates a small effect size, 0.50⩽d<0.80 a moderate effect size, and d⩾0.80 a large effect size.31
Concurrent validity was evaluated by assessing Spearman rank order correlation coefficients between CHQ scale/summary scores and the VAS rating of the child’s current health. Concurrent validity is hypothesised to be expressed in positive correlation coefficients. We also hypothesised that the scale general health perceptions has the highest correlations with the VAS rating by the parents, as the content of the other CHQ scales involves a subset of the overall health state as expressed in the VAS rating.
The ability of the CHQ to discriminate between a subgroup without parent reported conditions and subgroups with asthma, frequent headaches, and problems with hearing was assessed by Mann Whitney U tests, and effect sizes defined as d = [mean(no conditions)−mean(with condition)]/SD in the condition subgroup; as SDs were generally higher in the subgroups with a reported condition than in the subgroup without conditions this choice resulted in comparatively conservative effect size estimates.31 Based on the content of the scales and summary measures11 and the nature of the three parent reported chronic conditions, we hypothesised that the effect sizes (d) will be higher for the physical summary measure than for the psychosocial summary measure in the subgroups with asthma and headaches (in comparison with the subgroup with no parent reported chronic conditions) and the other way around for the subgroup with problems with hearing (as such problems are not directly reflected in the CHQ “physical” scales); furthermore that the scale general health perceptions will show large effect sizes in each subgroup with a reported condition; and that the scale bodily pain will be the most affected scale in the subgroup of children with parent reported frequent headaches.
All analyses were done in SPSS, version 10.0. The medical ethical review board of Erasmus MC, University Medical Centre Rotterdam approved this study.
With regard to the 2040 schoolchildren in the regional sample, 1435 parents (70%) responded; a retest questionnaire was send to parents of 329 children (a random subgroup) that resulted in a response by 234 parents (71%). Mean respondent age was 37.7 years (range 23–60; SD 5.2); 86% were mothers; 8% were of non-Dutch ethnic origin; 18% had completed higher vocational education/university, 4% had only elementary or no education; most were employed (54%) or homemakers (35%). The schoolchildren ranged from 4–13 years of age (mean 8.1; SD 2.4); 51% were girls; 7% belonged to a single parent family. Twenty four of 28 CHQ items had less than 1.5% missing answers; the highest percentage found was 1.7% (behaviour item “lied or cheated?”). Twenty seven of 28 CHQ items had less than 0.5% non-unique answers; the single item scale family cohesion had the highest percentage (0.8%).
In both our regional school sample and the nationwide general population sample, seven CHQ scales showed ceiling effects (>25% of the respondents had the maximum score); five scales even showed a profound ceiling effect in both samples (>50% at the extreme) (table 2). (Please see the addendum (available on line http//:www.jech.com/supplemental) for a table showing the comparison of mean scores and standard deviations of the CHQ-PF28 in a subgroup of children aged 4–11 years of the regional sample of schoolchildren and the whole nationwide general population sample of Statistics Netherlands.) In three subgroups of the sample of schoolchildren with a specific condition (asthma, frequent headaches, and problems with hearing; see table 6) fewer ceiling effects were present: bodily pain did not show a ceiling effect in any subgroup; parental-emotional did not show a ceiling effect in the subgroup with problems with hearing; family activities showed less ceiling effect (<50% at the extreme) in subgroups with headaches and problems with hearing.
Only one multi-item scale, that of physical functioning, and both CHQ summary measures showed adequate internal consistency in both samples (Cronbach’s α>0.70). All multi-item scales, except for parental-emotional and parental-time, showed higher (corrected) item-own scale correlation coefficients than (on average) item-other scale correlation coefficients (see table 3). Items of scales with a “physical” content showed comparatively high correlation coefficients with the physical summary score, and items of scales with a “psychosocial” content with the psychosocial summary score (table 3).
Seven CHQ scales and the psychosocial summary measure had significantly higher mean retest scores (p<0.05), although with only minor effect sizes (<0.20). Test/retest ICCs ranged from 0.14 to 0.78 (average 0.50; p<0.01) (table 4); general behaviour and the psychosocial summary measure showed excellent test-retest reliability (ICC 0.70 or higher); seven other CHQ scales showed moderate test-retest reliability (ICC 0.50–0.70).6
The child health questionnaire parent form 28 items (CHQ-PF28), a short generic measure of health status and health related quality of life with eight multi-item and five single item scales is a feasible measurement instrument for large scale applications in community medicine by paper and pencil (mailed) questionnaires as well as oral interviews.
This study provides reference (norm) scores for the CHQ-PF28 for age groups 4–11/4–13 years, derived from large random school based and general population samples.
Score distributions and concurrent and discriminative validity of the “short” CHQ-PF28 items are comparable to the results of its longer counterpart, the CHQ-PF50 items.
The physical and psychosocial summary measures of the “short” CHQ-PF28 show adequate internal consistency and its psychosocial summary measure has excellent test-retest reliability, but individual CHQ-PF28 scales may have a low internal consistency and test-retest reliability.
All correlation coefficients between CHQ scales/summary measures and the VAS rating of the child’s health were positive and significant (p<0.01) as was hypothesised. General health perceptions (specifically the item GHGLOBAL—see table 3—of this scale), correlated best with the VAS rating, as hypothesised (Spearman correlation coefficients 0.39/0.50) (table 5).
The prevalences of parent reported chronic conditions were: asthma (9%); frequent headaches (3%); and problems with hearing (3%); 66% had no parent reported condition at all. All CHQ scale and summary scores were lower in the subgroups with a reported condition compared with the subgroup with no reported condition; all differences but three (regarding the subgroup with asthma compared with no conditions) were significant (p<0.05) (table 6). In the subgroups with reported asthma and frequent headaches, the impact on the physical summary measure as expressed by the effect sizes was higher than on the psychosocial summary measure, while the reverse was true in the subgroup with problems with hearing, as hypothesised (table 6). The scale general health perceptions was significantly affected in each of the three subgroups with a condition, as hypothesised, but only resulted in large effect sizes in the subgroups with asthma and problems with hearing; in the subgroup with frequent headaches the scale bodily pain, as hypothesised, showed the largest effect size (table 6).
Feasible, reliable, and validated measures are needed to describe health and health related quality of life of children in evaluation studies of community health and clinical interventions, burden of disease studies, or in community health and clinical practice; these measures may be applied in addition to condition specific health related quality of life measures and in addition to various clinical measures.
The child health questionnaire parent forms (CHQ-PF) provide a comparatively well evaluated option for measurement of generic health status and health related quality of life in child populations.
The “short” CHQ-PF28 items offers an acceptable alternative to the longer CHQ-PF50 items if the physical and psychosocial summary measures are of primary interest and reliable estimates of each separate CHQ scale are not required for the purpose of health measurement.
This first evaluation of the 28 item short form of the CHQ for parents in an independent large random sample of school children re-established the feasibility of the CHQ-PF as a paper and pencil health status questionnaire12,14,19; large scale oral CHQ-PF28 interviewing by trained interviewers in a nationwide random general population sample also proved to be feasible.24 This study supports the concurrent and discriminative validity of the CHQ-PF28 and provides reference (norm) scores derived from school based and general population samples, but gives rise to some concerns about ceiling effects, the internal consistency of (short) multi-item scales, and test-retest reliability, requiring further investigation.
Limitations of our study include the choice of the sample(s) and study design issues. Our primary study group consisted of a random sample of predominantly healthy school children (66% had no parent reported chronic conditions). So, the results of this study are of primary interest for community health applications such as burden of disease studies,5 evaluations of preventive interventions in the general population, or future applications in the daily practice of community medicine, for example, applications by school based nurses.7,8 We recommend, however, evaluations in other populations as well.
Evaluation of test-retest reliability in our study did not include an assessment of health transition that may occur between test and retest, which we recommend to include in future studies. The responsiveness of the CHQ to changes in medical/social conditions was not evaluated and therefore remains to be studied. Comparisons between our sample and the national sample of Statistics Netherlands provide only preliminary insights as in the Statistics Netherlands study a different mode of data gathering and questionnaire administration had been applied.
In this study, only the CHQ-PF28 items were administered, as mingling the extra items of the CHQ-PF50 with the regular CHQ-PF28 items may influence the results. However, additional analyses may be recommended in other, existing, datasets that do include the CHQ-PF50 concerning results regarding the subset of CHQ-PF28 items in comparison with the results regarding all CHQ-PF50 items.
In both samples in this study, seven CHQ-PF28 scales showed a percentage of respondents higher than 25% that have the maximum score of the scale. This finding (that is, ceiling effect) is common in paediatric health measurement and health related quality of life studies; it is equally apparent in studies with other CHQ versions and with other measurement instruments.11,12,14,15,17–20,32–34 However, it limits the use of these instruments to detect changes in a generally healthy population, or to describe excellent health beyond the average in comparatively healthy populations. In specific populations with chronic conditions ceiling effects may be less pronounced as was shown in this study.14
Levels of reliability of health status measures in children may be low relative to instruments designed for adults,1–3 especially in the case of shortened scales. With regard to the CHQ-PF28 we recommend to restrict the evaluations to the CHQ summary measures, which showed to have adequate internal consistency.
Evaluation of CHQ-PF28 (corrected) item-own scale correlations and item-other scale correlation coefficients showed that all CHQ scales concerning the health status of the children themselves represented separate entities.
One CHQ-PF28 scale and a summary measure showed excellent test-retest reliability and seven scales moderate test-retest reliability. The CHQ single item scales showed, overall, lower test-retest reliability than multi-item scales/measures, which illustrates that measurement of concept via multiple items may increase reliability.6,29 Seven scales and the psychosocial summary measure had statistically significantly higher retest scores, although effect sizes were minor. This might reflect a somewhat lower prevalence of, for example, viral infections at the retest (later in the spring season).
The statistically significant, positive correlation coefficients between CHQ scales/summary measures and the VAS rating of the child’s health by the parent supported the concurrent validity of the CHQ-PF28. However, the results illustrate that CHQ scales other than general health (for example, those related to role functioning, psychosocial health, and family cohesion) measure concepts that extend beyond the mere measurement of health in general such as was done by the VAS rating by the parents. This study showed the ability of the CHQ-PF to discriminate between absence/presence of three parent reported conditions, with scoring patterns that generally confirm hypotheses that were based on the nature of the three conditions. This is in accordance with earlier reports on the CHQ-PF50 and CHQ-CF87.11–20 We recommend further assessments of the validity of the CHQ-PF28 by comparing scores between clinical groups with reported medical conditions, in addition to this study that included parent reports.1–3
There is a clear need for feasible, reliable, and valid measures to describe generic health status and health related quality of life in child populations; this is equally true for community health and for clinical applications, and in the future possibly for applications in daily medical (preventive) practice.1–8 The CHQ is such a measure, and short forms like the CHQ-PF28 are especially welcome given the overload of items in most questionnaires. This study showed the score distributions and concurrent and discriminative validity of the CHQ-PF28 to be comparable to its longer counterpart, the CHQ-PF50. However, the internal consistency and test-retest reliability of many individual CHQ-PF28 scales were comparatively low. The two CHQ summary score measures however did show adequate internal consistency, while the psychosocial summary measure also showed excellent test-retest reliability. In community health applications, therefore, the CHQ-PF28 offers an acceptable alternative to the longer CHQ-PF50 if evaluation of the summary measures suffices and reliable estimates of each separate CHQ scale are not required for the purpose of health measurement. In addition to our study, we recommend that further assessment of the CHQ-PF28 be made in varied clinical samples, as well as a close evaluation of both the responsiveness to change and test-retest characteristics.
This study was funded by the Netherlands Organisation for Health Research and Development (ZonMw) NWO-Health Care Efficiency Research Program Grant no 945-10-022. The GGD - Municipal Health Service in Rotterdam, Netherlands supported this project and was responsible for the data collection. We are grateful to the school physicians, nurses, doctor’s assistants, researchers and policy advisors of the department of Youth of the Municipal Health Service, especially Rina Labbé-Koopman, MD, head of the department, Hella van den Berg, MA and Joke Belder, for facilitating this project in collaboration with the related schools in and municipalities of Krimpen aan den IJssel and Ridderkerk, Netherlands. We thank Ilse Oonk, MA, Ghazaleh Sehat, Annemieke van Eijsden, MSc, and Gerard Borsboom, MA of the Department of Public Health of Erasmus MC for help with the organisation of this project, data collection, data entry and statistical support. We are grateful to Gouke J Bonsel, MD, PhD, and Reinoud J B J Gemke, MD, PhD for helpful advice regarding the design of this study.
Conflicts of interest: none declared.
The interpretations reported in this article are the author’s (AMB) and do not necessarily correspond with the policy of Statistics Netherlands.