Background In routine health datasets, such as hospital episode statistics (HES), ethnicity information is not always collected or the quality of data may be unreliable. This may have implications when assessing outcomes by ethnicity. Name analysis algorithms are an alternative method for assigning ethnic groups to individuals based on their surname and forename. We used Onomap, a name analysis algorithm, to investigate if the association between ethnicity and cancer incidence varied according to how ethnicity was assigned.
Methods Cancer registrations between 1998 and 2009 in children and young people (0–29 years) were extracted from the Yorkshire Specialist Register of Cancer in Children and Young People (n = 3992). Patients were linked to inpatient HES data (1997–2011) to obtain information on ethnicity and their surname and forename were matched to an Onomap ethnicity. Each source of ethnicity was categorised as non-South Asian (NSA) or South Asian (SA). A further ethnicity indicator was defined based on the combined results of HES and Onomap ethnicities (“Combined”). Direct age standardised incidence rates (ASR) were calculated and incidence rates between ethnic groups were compared using Poisson regression.
Results HES ethnicity was missing in 528 (13.2%) patients. The proportion of patients identified as SA was slightly lower for Onomap (7%) compared to HES (8%) and the “Combined” indicator (9%). NSA incidence rates were lower based on HES than Onomap or the “Combined” indicator; ASR for HES was 150 per 1,000,000 population compared to 174 for Onomap and 171 for “Combined”. For SAs, HES and Onomap produced similar results (ASRs 162 and 163 respectively) which were lower than the ASR based on the “Combined” indicator (ASR = 201). For all cancers combined, a statistically significant difference between ethnic groups was only evident using the “Combined” indicator; cancer incidence was 18% higher in SAs (IRR = 1.18 (95% CI 1.05–1.31)). Differences in incidence by diagnostic group varied depending on the source of the ethnicity indicator used; lymphoma incidence rates were significantly higher in SAs but the magnitude of this difference varied from 27% (Onomap, (95% CI 1.00–1.62)) to 60% (“Combined” (95% CI 1.29–2.00)).
Conclusion Using different methods of assigning ethnicity can result in different estimates of ethnic variation in cancer incidence. Combining different methods of ethnicity assignment in a single indicator results in a more reliable estimate of ethnicity than use of one single source. Further validation of these methods in another large health data set of children (Paediatric Intensive Care Network Audit) is planned.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.