Article Text

Download PDFPDF
OP60 Quality of ethnicity data within Scottish health records and implications of misclassification for ethnic inequalities in severe COVID-19: A national linked data study
  1. Ronan McCabe1,
  2. Sarah Amele1,
  3. Eliud Kibuchi1,
  4. Anna Pearce1,
  5. Kirsten Hainey1,
  6. Evangelia Demou1,
  7. Srinivasa Vittal Katikireddi1,
  8. Patricia Irizar2,
  9. Dharmi Kapadia2,
  10. James Nazroo2
  1. 1MRC/CSO Social and Public Health Sciences Unit, University of Glasgow, Glasgow, UK
  2. 2Department of Sociology, The University of Manchester, Manchester, UK
  3. 3Global Health and Social Medicine, King’s College London, London, UK
  4. 4The Usher Institute, University of Edinburgh, Edinburgh, UK
  5. 5Research Data Scotland, Edinburgh, UK
  6. 6Public Health Scotland, NHS Scotland, Glasgow, UK
  7. 7Institute of Applied Health Sciences, University of Aberdeen, Aberdeen, UK
  8. 8Department of Mathematics and Statistics, University of Strathclyde, Glasgow, UK
  9. 9Scottish Centre for Administrative Data Research, University of Glasgow, Glasgow, UK


Background Having high-quality ethnicity data alongside health records is crucial to monitor and redress ethnic inequalities in health. We assessed the quality of ethnicity coding in Scottish health datasets and its implications for assessing ethnic inequalities in severe COVID-19.

Methods We compared ethnicity coding within the Public Health Scotland Ethnicity Look-up (PHS-EL) dataset, and other NHS datasets, with the 2011 Scottish Census as the ‘gold standard’. Measures of quality included the level of missingness (ethnicity missing compared to the Census) and misclassification (ethnicity miscoded compared to the Census). We examined the implications of misclassification, using age- and sex-adjusted Cox proportional hazards models to estimate the risk of severe COVID-19 (hospitalisation or death) by ethnicity using PHS-EL compared with Census coding.

Results Misclassification within PHS-EL was higher for all minority ethnic groups [12.5 to 69.1%] compared to the White Scottish majority [5.1%] and highest in the White Gypsy/Traveller group [69.1%]. Missingness in PHS-EL was high overall [30%] but was not higher among ethnic minority groups. PHS-EL data often underestimated severe COVID-19 risk compared to Census data. For example, in the White Gypsy/Traveller group the Hazard Ratio (HR) was 1.68 [95% Confidence Intervals (CI): 1.03, 2.74] compared to the White Scottish majority using Census ethnicity data and 0.73 [95% CI: 0.10, 5.15] using PHS-EL data; and HR was 2.03 [95% CI: 1.20, 3.44] in the Census for the Bangladeshi group versus 1.45 [95% CI: 0.75, 2.78] in PHS-EL.

Conclusion The quality of ethnicity coding in Scottish health datasets is poorer among minority ethnic groups and this can bias estimates, thereby threatening monitoring and understanding ethnic inequalities in health.

  • Ethnicity data
  • linked health records
  • ethnic inequalities in health

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.