Article Text
Abstract
Background Having high-quality ethnicity data alongside health records is crucial to monitor and redress ethnic inequalities in health. We assessed the quality of ethnicity coding in Scottish health datasets and its implications for assessing ethnic inequalities in severe COVID-19.
Methods We compared ethnicity coding within the Public Health Scotland Ethnicity Look-up (PHS-EL) dataset, and other NHS datasets, with the 2011 Scottish Census as the ‘gold standard’. Measures of quality included the level of missingness (ethnicity missing compared to the Census) and misclassification (ethnicity miscoded compared to the Census). We examined the implications of misclassification, using age- and sex-adjusted Cox proportional hazards models to estimate the risk of severe COVID-19 (hospitalisation or death) by ethnicity using PHS-EL compared with Census coding.
Results Misclassification within PHS-EL was higher for all minority ethnic groups [12.5 to 69.1%] compared to the White Scottish majority [5.1%] and highest in the White Gypsy/Traveller group [69.1%]. Missingness in PHS-EL was high overall [30%] but was not higher among ethnic minority groups. PHS-EL data often underestimated severe COVID-19 risk compared to Census data. For example, in the White Gypsy/Traveller group the Hazard Ratio (HR) was 1.68 [95% Confidence Intervals (CI): 1.03, 2.74] compared to the White Scottish majority using Census ethnicity data and 0.73 [95% CI: 0.10, 5.15] using PHS-EL data; and HR was 2.03 [95% CI: 1.20, 3.44] in the Census for the Bangladeshi group versus 1.45 [95% CI: 0.75, 2.78] in PHS-EL.
Conclusion The quality of ethnicity coding in Scottish health datasets is poorer among minority ethnic groups and this can bias estimates, thereby threatening monitoring and understanding ethnic inequalities in health.