Retest reliability of surveillance questions on health related quality of life
- 1Saint Louis University School of Public Health, Department of Community Health, Salus Center, St Louis, USA
- 2Saint Louis University College of Public Service, Department of Research Methodology, St Louis, USA
- 3Missouri Department of Health, Office of Surveillance, Research and Evaluation, Jefferson City, USA
- Correspondence to: Dr E M Andresen, Saint Louis University School of Public Health, Department of Community Health, Salus Center, 3545 Lafayette Avenue Suite 300, St Louis, MO 63104-1399, USA;
- Accepted 8 January 2003
Study objectives: Health related quality of life (HRQoL) is an important surveillance measure for monitoring the health of populations, as proposed in the American public health plan, Healthy People 2010. The authors investigated the retest reliability of four HRQoL questions from the US Behavioral Risk Factor Surveillance System (BRFSS).
Design: Randomly sampled BRFSS respondents from the state of Missouri were re-contacted for a retest of the HRQoL questions. Reliability was estimated by κ statistics for categorical questions and intraclass correlation coefficients for continuous questions.
Setting: Missouri, United States.
Participants: 868 respondents were re-interviewed by telephone about two weeks after the initial interview (mean 13.5 days). Participants represented the adult, non-institutionalised population of Missouri: 59.1% women; mean age 49.5 years; 93.2% white race.
Main results: Retest reliability was excellent (0.75 or higher) for Self-Reported Health and Healthy Days measures, and moderate (0.58 to 0.71) for other measures. Reliability was lower for older adults. Other demographic subgroups (for example, gender) showed no regular pattern of differing reliability and there was very little change in reliability by the time interval between the first and second interview.
Conclusions: Retest reliability of the HRQoL Core is moderate to excellent. Scaling options will require future attention, as will research into appropriate metrics for what constitutes important population group differences and change in HRQoL.
Within the public health arena, emphasis has been placed on measuring outcomes such as “quality of life” and, more specifically, “health related quality of life” (HRQoL). Health related quality of life refers to the “physical, psychological, and social domains of health, seen as distinct areas that are influenced by a person’s experiences, beliefs, expectations, and perceptions”.1 HRQoL can itself be defined as a health issue of importance, and variations by person, place and time can be important to understanding population differences.2–6 In addition, HRQoL measures can be outcomes of other health events such as access to care,7 impact of chronic conditions such as arthritis,8 or the effects of aging.9 Therefore, the surveillance of HRQoL is important to monitoring changes in the health of people and populations.
One of the two overarching goals stated in the American public health plan, Healthy People 2010, is to increase the quality and years of healthy life.10 This goal will be met by increasing life expectancy and improving HRQoL among people and communities. Healthy People 2010 proposes three surveillance tools for measuring HRQoL: Self-Reported Health, Healthy Days, and Years of Healthy Life.10
The Behavioral Risk Factor Surveillance System (BRFSS), supported by the Centers for Disease Control and Prevention, is an ongoing, state based survey that measures health behaviours and risk factors of the US non-institutionalised adult population, through random digit dialled telephone interviews.11,12 Since 1993, the BRFSS has included four questions designed to monitor HRQoL (HRQoL Core). The BRFSS HRQoL Core includes questions used to measure both Self-Reported Health and Healthy Days mentioned above as recommended tools for monitoring HRQoL. Despite the importance of these tools in determining whether individuals and populations within the US achieve the first goal of Healthy People 2010, few studies have examined the psychometric properties of these items.
The objective of this study was to investigate the retest reliability of the BRFSS HRQoL Core, specifically Self-Reported Health and Healthy Days, among a population of 868 adults from the US state of Missouri. The reliability of these questions was measured within the total group of retest respondents, as well as by subgroups based on gender, age, race, income, and education in order to examine the measurement properties among different segments of the population.
The BRFSS HRQoL Core (also called the Healthy Days module) has standardised survey and scoring methods, and established comparative population norms.5,13–16 The HRQoL Core includes a global assessment question that measures Self-Reported Health (that is, “Would you say that in general your health is excellent, very good, good, fair, or poor?”). This single question has a substantial research track for its usefulness in community studies.17 Also included in the HRQoL Core are two questions reporting the number of days during the previous 30 days in which the respondent’s physical or mental health was not good. The sum of these two measures results in the total number of “unhealthy days” (ranging from 0 to 30 days). This summary measure of HRQoL is usually expressed in the positive, that is Healthy Days, and represents the number of recent days in which both the respondent’s physical and mental health were good. Healthy Days is calculated by subtracting the number of “unhealthy days” from 30. The fourth question of the HRQoL Core measures the number of days in the previous 30 days in which the respondent experienced activity limitation because of poor physical or mental health, and is asked only of those people who reported at least one day of poor physical or mental health.
In addition, the mental health days have been used as a dichotomous measure, with 14 or more poor mental health days representing “frequent mental distress”.15 For this analysis, we added a similar dichotomous measure for physical health days (that is, “frequent physical distress”). Standard BRFSS questions on gender, age, race/ethnicity, marital status, education, employment, and income provided demographic data for this sample.
The 1999 Missouri BRFSS included 4277 subjects (adults 18 years of age or older). Data were collected through computer assisted telephone interviews using random digit dialling and disproportionate random sampling.18–20 During the months of January, February, and March 1999, 1459 potential respondents were identified and 1114 people were interviewed, for a response of 76.4%. Following standard BRFSS procedures, multiple attempts were made to re-interview 1015 respondents; 99 respondents were excluded from attempts to conduct retest interviews primarily because of the telephone laboratory schedule: respondents who completed interviews at the beginning of January and end of March were not scheduled for retest. During February, March, and early April 1999, 889 re-interviews were conducted, for a response of 87.6%. Of these, 21 interviews contained missing, incomplete or incorrect information, leaving a total of 868 test-retest subjects. Retest interviews of the HRQoL questions were made about two weeks after the initial call (range 3–34 days; mean 13.5 days, standard deviation (SD) 4.6 days).
Primary analyses were conducted using SPSS statistical software.21 Retest sample characteristics are reported as either unweighted means and standard deviations or proportions. We report skewness of the continuous measures (that is, floor and ceiling effects); the distribution was considered skewed if 20% or more of subjects were grouped at either extreme.22 In this study, a ceiling effect refers to good health (that is, 0 poor health days or 30 Healthy Days) and floor effect refers to poor or bad health. We also compared the prevalence of BRFSS data from Missouri overall, and the USA overall, to the retest sample using weighted data from the CDC BRFSS data web site.23
Descriptive reliability results are presented showing subject mean responses for time one and time two, or per cent agreement for categorical items. Reliability of continuous measures (that is, responses to “days” questions) was evaluated by intraclass correlation coefficients (ICC) using a two way random effects model.24 For dichotomous measures (for example, frequent mental distress) we used κ statistics.25 A weighted κ statistic was computed for the Self-Reported Health question to account for partial agreement among ordered categories.26 We classified ICC and κ statistics above 0.75 as excellent agreement and below 0.40 as poor agreement.25 Reliability is reported for the entire sample and for demographic subgroups that are often reported to demonstrate differing HRQoL (age, gender, ethnicity, education, and income). Because of the small number of respondents who reported they were other than white or African American, analyses of potential ethnic differences were restricted to only these two groups. We also report reliability for groups based on the time interval between test and retest.
Participants of the retest study were 59.1% women; the average age was 49.5 years old (range 18 to 93), with 23.5% of the sample aged 65 or older. Most (93.2%) was white. Table 1 shows the demographic characteristics of the sample in more detail. In general, retest subjects were not different in a systematic manner from the entire 1999 BRFSS sample.23 For example, among all Missouri respondents to the BRFSS in 1999, 14.5% reported they had less than a high school education, 52.6% were women, and 19.6% were aged 65 and older. Table 1 also lists the descriptive results for the four BRFSS HRQoL Core questions and three computed measures. For the four “days” measures, the percentage of people who reported their HRQoL to be either at the minimum or maximum health (that is, floor or ceiling effects) is also shown. None of the measures demonstrated floor effects for poor health, however, all four measures showed ceiling effects (that is, 44% reported a maximum of 30 Healthy Days and 80% reported zero limited activity days at time 1). The skewed distributions also are evident in that the three “days” questions from the HRQoL Core result in a median of zero days, with means ranging between 2.2 and 4.5 days. Missing data were not a large problem with the BRFSS questions. Response was 97% or higher for all questions with the exception of baseline household income question (9.4% missing). When data were paired for the reliability estimates the response remained above 97%, although the Healthy Days measure was missing for 4.6% of subjects because it relies on two questions (table 2).
The Missouri sample was compared with previously published HRQoL data from other US states and the nation.5,16,23 HRQoL was somewhat better in US samples than in the Missouri sample, for example in 1999, American states reported that 22.7% of BRFSS respondents were in excellent health and 3.3% in poor health, compared with 18.9% and 6.6% for the Missouri retest sample.
Retest reliability of the seven measures ranged from 0.58 (frequent mental distress) to 0.75 (Self-Reported Health and Healthy Days) for the entire sample. Reliability was lower for the computed categorical measures of mental and physical distress compared with the original continuous “days” version of these questions. As shown in table 2, the level of reliability varied somewhat among subgroups. However, the only consistent group difference was for lower reliability among older adult respondents, and there was a slight tendency for lower reliability for longer time intervals between test and retest. There was no consistent pattern of different reliability by gender, ethnic group, education, or income. In a post hoc analysis, we examined if there were differences for time intervals among demographic subgroups and found no relation between the interval and any variable.
Public health surveillance activities often focus on specific events, like mortality or cancer. However, health related quality of life (HRQoL) is included as an overarching aspect of the American public health plan.
Surveillance questions on HRQoL can be used to examine different outcomes for specific conditions, like arthritis or heart disease, and to detect general health disparities.
Retest reliability of the BRFSS measures for HRQoL seems to be moderate to strong. There are no major systematic demographic differences in reliability except for lower estimates for older adults.
The retest reliability of the four Core HRQoL questions of the BRFSS and three computed measures (that is, Healthy Days) was moderate to excellent. This reinforces the positive earlier results of retest reliability for these questions from a sample of people with disability,27 and for retest reliability of the BRFSS in general.18,28,29 Not surprisingly, reliability decreased somewhat for older adults, and with longer intervals between tests. Longer retest times are naturally associated with more real health changes, and older adults have more chronic conditions and variable health.30 However, there was no evidence that reliability varied systematically by other demographic characteristics of typical BRFSS samples. The Healthy Days summary measure had slightly higher reliability than each of its component measures, physical health and mental health. This suggests that the summary item is a more consistent measure and that both components are important elements of HRQoL.
The κ values for the dichotomous categories of “days” measures (mental health and physical health) were low compared with per cent agreement; however, this is partly due to the low prevalence of respondents categorised with “frequent distress,” which biases κ values.31 In addition, the reliability of the dichotomous measures was lower than that of the corresponding continuous measures, suggesting that information was lost in the categorisation and misclassification may have occurred near the cut off value. Considering the brief surveillance nature of these questions, it is not surprising that the “days” questions result in somewhat skewed distributions. Future research will need to examine the merits of categorical scaling of these questions. Also important to the future use of the questions is the issue of what constitutes meaningful group differences. If an acceptable criterion measure for HRQoL states can be adopted, ROC analyses would be useful in developing thresholds for determining, for example, how many poor physical health days constitutes an important decrement in HRQoL.
The BRFSS HRQoL Core questions have been subjected to other tests of their measurement characteristics within the US. These include generally positive cognitive testing,32 and positive qualitative reports by Native American respondents in New Mexico.6 Research among adults with disability also has been good, including the finding that the questions are acceptable and pose very low burden on respondents.27,33 There is some evidence that acceptability of BRFSS surveys, including HRQoL, differs by interviewer type: Gilliland and colleagues found that Native American subject response to Native American interviewers was 77% compared with 69% for other interviewer ethnic groups.6 On the other hand, the response for the 1999 BRFSS baseline study was higher than that for similar but smaller studies previously conducted within the US states of Missouri (76.4% v 66%),18 Massachusetts (65%),29 or New York (59.7%),28 while the retest response was comparable to the earlier Missouri study (87.6% v 88%) and higher than the other two studies (70% and 55.6%, respectively).
This study is based on BRFSS retest respondents from the single state of Missouri; the sample reported somewhat lower HRQoL than the general US population, a finding that has been reported previously.3,5 However, there is no known evidence to think that slight differences in population levels of health would affect the reliability patterns reported here. The slightly larger older age group of the Missouri retest sample could produce lower reliability overall. For samples with disability or chronic health problems, reliability might vary more than reported here. Importantly, the level of reliability was lower than the 0.75 cut off used to rate excellent agreement for all but two measures, suggesting that there will be some error in population estimates based on the HRQoL questions. Retest reliability of the HRQoL Core should be assessed in other US states and population subgroups, as well as internationally. If results are consistent with our findings, then we can be assured that the BRFSS HRQoL surveillance core is strong.
We are grateful to Missouri BRFSS Project staff (Dr. Theophile Murayi, Steve Kilfoil, Jane Heath, Jacqui Turner and Pat Nelson) and to telephone interviewers from Kelly Services. We extend our special thanks for the support and advice of our CDC colleagues, Dave Moriarty and Matt Zack.
Funding: this work was supported, in part, with funding from the Health Care and Aging Studies Branch, Division of Adult and Community Health, National Center for Chronic Disease Prevention and Health Promotion (CDC; R13/CCR717041-01), and the Behavioral Surveillance Branch (CDC;U58/CCU700950-14). Support also was provided by the CDC through the Saint Louis University Prevention Research Center (CDC;U48/CCU710806).
Conflicts of interest: none.