Statistics from Altmetric.com
We read with interest McCaffery and Barratt’s editorial on assessing psychosocial and quality of life outcomes in screening.1 We agree that it is necessary to show greater concern about the reliability and validity of outcome measures used but would also argue that it is equally as important to be clear about which outcomes are being assessed and how outcome data should be analysed and interpreted.
In their study Marteau and colleagues selected five items from a generic measure of health status (the SF-36).2 No explanation was given of why these particular items were selected, why they should be added together, what the label “self assessed health” actually means, or why it was relevant to the study. The five items clearly assess more than one construct. The item asking for a rating of health is a measure of impairment while the other four assess health beliefs that may have no relation to health status. Thus it could well be concluded that the study shows that people who believe they have poorer health are more likely to have a health problem. Health beliefs may well change in the presence of ill health but it would be expected that such changes would occur over a prolonged period of time. No assessment of quality of life or true health status (often referred to as health related quality of life) was made in the study. In this respect the outcome measure used lacked construct validity.
When adding scores of items together an assumption of unidimensionality is made. There is no justification for adding together scores of items that measure different constructs. A major problem with the scaling of the SF-36 scales presented in the manual is that it relies on correlations. However, dimensionality, additivity, and item ordering should be established using item response theory (IRT).3 IRT evidence shows that the SF-36 scales are not unidimensional and that items in the subscales cannot validly be summed in this way.4
The outcome measure used by Marteau and colleagues failed to detect any impact of the screening process. This could have been predicted given the selection of items that are not clearly related to the screening process. It is clear that where there is an interest in the process of screening itself it is essential to use outcome measures that are specific to that situation. As mentioned in the editorial only a few measures specifically developed to measure the consequences of screening are available. However, for example, the psychological consequences questionnaire5 lacks content validity when measuring consequences of false positive screening mammography.6,7 Therefore further research is needed not only to develop instruments that capture the psychosocial consequences of screening adequately but also to ensure that these instruments meet the modern psychometric criteria.
In our recent paper in the journal we present data showing that self assessed health predicts the presence of an abdominal aortic aneurysm independently of known risk factors. In an accompanying editorial, McCaffrey and Barratt note this interesting finding and reinforce our conclusion regarding the importance of including measures of psychological outcomes at baseline to avoid erroneous inferences regarding the psychological consequences of screening.1
Brodersen et al criticise our choice of outcome measure, namely self assessed health, on two counts. The first concerns the psychometric properties of the scale selected; the second concerns the validity of assessing generic as compared with context specific outcomes.
We measured self assessed health using the five item scale: general health, taken from the SF-36 for which there is good evidence of internal reliability.2 Validating a measure of self assessed health is more difficult. A measure of “true health status” as suggested by these correspondents would not suffice. Indeed, the literature on self assessed health is of enduring fascination precisely because self assessed health is not synonymous with “true health status”.
Regarding the use of generic as compared with screening-specific outcome measures Brodersen et al take a firm view arguing that “it is essential to use outcome measures that are specific to that situation.” There is good evidence that participation in screening programmes has psychological consequences that are detectable using both generic and specific measures.3 The choice of outcome measure should of course depend critically upon the research question. If one wants to know for example whether screening for risk of heart disease causes depression then a generic measure of depression is needed. By contrast if one wants to know whether screening for risk of heart disease causes increased worry about heart disease, a more specific measure is needed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.