Background Localised information on health related behaviours and outcomes are seldom possible from surveys as they normally do not sample in all neighbourhoods. Furthermore, in those localities which are sampled, the sample sizes are rarely large enough to produce reliable estimates. However, the growing demand for localised health statistics has led to synthetic estimates being generated for alcohol consumption, smoking behaviour, psychiatric ill-health and other conditions. Generally these synthetic estimates have been based on models of binary outcomes. Self-assessed health measures tend to be recorded on an ordinal scale and dichotomising these loses potentially useful information. We set out to assess whether it would be possible to extend the multilevel synthetic estimation methodology to generate outcomes with more than two categories.
Methods We developed multinomial multilevel models (in MLwiN) of limiting long term illness (LLTI) using data from the 2010/11 sweep of the Crime Survey (n=46,754). Synthetic estimates were calculated for the proportion of adults who stated that (i) they were severely limited by a long term illness and (ii) their illness limited their activities but not severely. Estimates were generated for every Middle Layer Super Output Area in England and Wales.
Results To internally validate the synthetic estimates they were aggregated to a geography where robust direct estimates from the source survey were available. There was a strong, statistically significant, positive relationship between the synthetic and direct estimates (Spearman’s rank correlations were 0.90 and 0.84 for severe and not severe LLTI respectively, with both correlations being statistically significant at the 0.01 level). An alternative to internal validation is to focus on the estimates’ external validation by comparing the synthetic estimate against an alternative data source. In this instance the estimates were evaluated against the results from the 2011 Census with the Spearman’s rank correlations being 0.92 and 0.86 respectively (both significant at the 0.01 level).
Conclusion This study has demonstrated that it is possible to extend the synthetic estimation methodology to produce small area estimates of health status based on multinomial multilevel models. Traditionally, national censuses have been one of the main sources of small-area health information, with numerous academic publications attesting to its importance as a data source for studies of health inequalities. However, the end of the traditional UK census raises potential challenges for the continued provision of very localised health data. Synthetic estimation, including the methodological extensions emanating from this study, could potentially help fill any future gaps in such health data.
- multilevel modelling