Article Text

Download PDFPDF

Methods in ethnicity research
Investigating the association between ethnicity and survival from breast cancer using routinely collected health data: challenges and potential solutions
  1. A. Downing1,
  2. D. Forman1,2,
  3. J. D. Thomas1,
  4. R. M. West1,
  5. G. Lawrence3,
  6. M. S. Gilthorpe1
  1. 1
    Centre for Epidemiology & Biostatistics, University of Leeds, Leeds, UK
  2. 2
    Northern & Yorkshire Cancer Registry & Information Service, St James’s Institute of Oncology, Leeds, UK
  3. 3
    West Midlands Cancer Intelligence Unit, Public Health Building, University of Birmingham, Birmingham, UK

    Statistics from

    Request Permissions

    If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


    Previous studies have reported differences in survival from breast cancer by ethnic group. Some of these studies have taken information on ethnicity from routinely collected data, such as Hospital Episode Statistics (HES). There are several problems associated with using ethnicity from HES data, such as multiple ethnicities being recorded for a single patient with multiple hospital visits, and missing data. This study will investigate methods to overcome these problems in order to assess the relationship between ethnicity and survival from breast cancer.

    Data and Methods

    48 234 breast cancer patients diagnosed between 1997 and 2003 were identified from a linked cancer registry-HES dataset for two regions of the UK. Where multiple ethnicities were recorded for a patient a single ethnicity was allocated according to the last recorded and most popular code. The data were also expanded to include all available hospital episodes (and all ethnicity information) for each patient (452 061 “episode-level” records). Ethnicity was missing in 16% of the patient-level records and 26% of the episode-level records. Multiple imputation (10 iterations) of missing ethnicity using age, stage, socioeconomic background and census area ethnic make-up as predictors was undertaken for the “last recorded”, “most popular” and “episode-level” data. Survival analysis (up to end 2006) was carried out using the imputed datasets.


    Across the two regions, 97.2% of the patients with a known ethnicity were White, 1.6% were South Asian and 0.8% were Black. White women were slightly older at diagnosis than the other groups, whilst Asian women had a higher proportion of early stage tumours, but these differences were not significant. Using “last recorded” ethnicity, unadjusted survival was higher in the Asian group compared to the White group (HR 0.77, 95% CI 0.66 to 0.92). After adjustment for age and stage this survival difference was no longer significant (HR 0.98, 95% CI 0.82 to 1.16). The results were similar using “most popular” ethnicity. Using the “episode-level” data to assign probabilities for each patient, unadjusted survival was again higher in the Asian group (HR 0.72, 95% CI 0.62 to 0.89) compared to the White group, but after adjustment survival was similar in the two groups. There was also some evidence of worse survival in the Black group compared to the White group (HR 0.98, 95% CI 0.98 to 1.39 after adjustment).


    Assessment of the association between breast cancer survival and ethnicity presents many challenges. Previous research in this area may have reported biased results, because of missing data and the failure to use all available information.