Background This study examines discrepancies between census and death registry information in the reporting of the ethnicity of the deceased in Lithuania and shows how these reporting differences influence estimates of mortality inequality by ethnicity.
Methods This study uses a census-linked dataset provided by Statistics Lithuania. The data include all deaths and population exposures between 1 July 2001 and 31 December 2004. The information on the ethnicity of the deceased was available from both the census and the death records. The Poisson regression was applied (1) to measure the effects of socio-demographic variables on the misreporting of ethnicity on death records and (2) to estimate mortality rate ratios by ethnicity based on census-linked and unlinked data.
Results The death-record-based information on ethnicity under-reports the deaths of people of Russian, Polish and other ethnicities and over-reports the deaths of people of Lithuanian ethnicity. This leads both to the underestimation of mortality in the three ethnic minority groups and to biased mortality rate ratios. The misreporting is higher in death records for women, persons younger than 80 years, divorced persons, urban residents and those dying from ill-defined causes.
Conclusion Studies based on unlinked data may provide biased estimates of ethnic mortality differences.
- Eastern Europe
Statistics from Altmetric.com
A number of studies of post-Soviet countries have reported that mortality inequalities by education and ethnicity exist in these countries and that inequality is rising.1–3 However, the evidence for these mortality inequalities in the region mainly relies on cross-sectional unlinked data. The major limitation of such data is the numerator–denominator bias.4–8 The bias originates from differences in the measurement of socio-demographic status in the census and death records: the socio-demographic status in the census records is based on self-reported information, whereas the corresponding data in the death records are provided by proxy informants.
The studies examining the magnitude of the numerator–denominator bias are relatively rare and often provide contradictory results. A few studies have suggested that the misreporting of socioeconomic status in death records is insignificant and thus may not have any substantial effect on the group-specific mortality estimates.9 10 Other matching studies have, however, found significant differences (up to 30%–40%) between self-reported data on education from surveys and the follow-up death registry data.11 12 The earlier study on Lithuania also found substantial discrepancies between the census-based and death-record-based information on education.13 It has been shown that numerator–denominator bias in unlinked data may take different directions in different countries. For example, the bias tended to result in an underestimation of the mortality differential by occupation in France, whereas it was responsible for an overestimation of the same differential in England and Wales.7
Several studies have examined the validity of ethnicity-specific mortality estimates based on the unlinked cross-sectional data. Two studies examining the mortality advantage of the Hispanic population in the USA concluded that this is an artefact due to biases in reporting ethnicity on death records.14 15 By contrast, a more recent study found that death record information on Hispanic ethnicity is reasonably good and that the observed health advantage cannot be explained by the bias.16
This study uses a unique census-linked dataset covering the entire population of Lithuania. The study examines discrepancies between the census and the death registry information in reporting the ethnicity of the deceased and shows how these reporting differences affect estimates of mortality disparity by ethnicity.
Data and methods
This study uses a census-linked dataset provided by Statistics Lithuania. The links were made by employees of Statistics Lithuania, who have permission to work with individual-level data. The dataset is based on all records from the 2001 Population and Housing Census, and all death and emigration records for the period between 1 July 2001 and 31 December 2004. The data cover individuals aged 30 years and older and include 3.2 and 4.1 million person-years of population exposure, and 72.5 and 65.9 thousand deaths for men and women, respectively. The data were provided in an aggregated multidimensional frequency table format that combines deaths and population exposures and are split by socio-demographic variables, including age, sex, education, marital status, ethnicity and urban –rural residence. Detailed descriptions of the dataset can be found in prior publications.13 17
The ethnicity variable (available from both the census and the death records) includes five categories. The first three categories account for 96% of the total person-years of population exposure (Lithuanian: 81.7%, Russian: 7.3%, Polish: 7.1%), whereas ‘other’ (dominated by Byelorussians and Ukrainians) and unknown ethnicity groups account only for 3.5% and 0.4% of the total person-years, respectively.
In order to estimate the impact of various factors on the discrepancies between the census and the death record-based information on ethnicity, we applied a Poisson regression with a misreporting rate as the dependent variable. Separate age- and sex-adjusted models and models adjusting for all variables were estimated to measure the effects of age, sex, causes of death, education, marital status and the place of residence (urban–rural). The Poisson regression was also used to estimate mortality rate ratios by ethnic group based on census-linked and unlinked (ethnicity of deceased established according to the information from death records) data.
Discrepancies between the census and the death record information
The lowest agreement rate between the information about ethnicity given in census and death records was found for ‘other’ ethnicity. About 13% of the deceased who were reported as being of ‘other’ ethnicity in the death records had reported themselves as being Poles and Russians in the census. The agreement rates for the Russian, Polish and Lithuanian ethnic groups were higher (90%–94%). In absolute numbers, the excess number of deaths artificially gained due to the misreporting of ethnicity in death records was most notable for the Lithuanian ethnic group (about 7000). Five thousand and four hundred deaths (4.6%) of people classified as being of Lithuanian ethnicity in the death records were in fact issued for Russians and Poles according to the self-reported information in the census. Only about 700 deaths indicating Russian, Polish or ‘other’ ethnic groups had to be reclassified into the Lithuanian group based on the census information.
Table 1 provides data about the relationships between socio-demographic variables, the causes of death and the misreporting of ethnicity. The reporting bias was smaller among the oldest old (older than 80 years), whereas the age group 50–59 years showed a small peak (model 2 only). The effects of education on the misreporting of ethnicity were inconsistent. Having secondary education was associated with a higher risk that ethnicity would be misreported, whereas being in the lowest educational category was associated with an even lower probability of misreporting than being in the higher education category. The latter relationship became statistically insignificant after all variables had been controlled for. Higher misreporting risks were identified for women, urban residents and divorced people. Among the causes of death, the lowest risk of misreporting bias was observed among those dying from neoplasms, whereas the remaining causes of death showed 10%–20% higher risks. The exception was ill-defined causes, which were associated with more than twice the reporting bias as neoplasms. Importantly, the difference remained statistically significant after all socio-demographic variables had been controlled for.
Ethnic mortality differentials as reflected by the census-linked and unlinked data
Table 2 shows that mortality rate ratios calculated from the census-linked and unlinked data provide contradictory evidence about the ethnic differentials. Mortality rate ratios calculated according to the unlinked data show that mortality among the Russian, Polish and ‘other’ ethnic groups is significantly lower than among the Lithuanian ethnic group. The only exception was for Polish men, among whom no significant mortality rate ratios were found. The use of census-based information leads to contrary results. The census-linked data indicate that members of Russian, Polish and ‘other’ (women only) ethnic groups have mortality risks that are 1.1–1.3 times higher than those of the Lithuanian ethnic group. Any statistically significant differences were observed between the ‘other’ and Lithuanian men.
This study found a notable degree of misreporting of ethnicity in death records in Lithuania. The death-record-based unlinked data under-report the deaths of people of Russian, Polish and other ethnicities and over-report the deaths of people of Lithuanian ethnicity. This leads both to the underestimation of mortality in the three ethnic minority groups and to biased mortality rate ratios. These findings indicate that, in certain cases, the unlinked studies may send a misleading message about the direction and magnitude of inequalities to policy-makers. Although the importance of the reporting bias may differ markedly from country to country, the evidence about ethnic mortality differentials based on unlinked data should be treated with caution.
We have shown that the misreporting is more frequent in the death records of women, divorced persons, and urban residents. A substantially higher misreporting probability was found among those who died of ill-defined causes. It is possible that these causes of death are more common in very specific (marginal) population groups with poorer information quality provided in both death and census records.17 Ill-defined causes of death are also common among very old people, especially those living alone. In these cases, proper informants to report the socio-demographic status in the death records are often not available.Interestingly, the reporting bias was lower among the oldest old.
It remains unclear what factors might be responsible for the misreporting of ethnicity in death certificates. The lack of a proper informant at the moment the death record was issued could explain a large share of the misreporting cases. It is possible that the misreporting of the ethnicity of the deceased occurs more frequently in ethnically mixed families, which are relatively common in Lithuania. Finally, some proxy informants may confuse information on ethnicity and citizenship (according to the Statistics Lithuania, more than 93% of Russians and 99% of Poles are Lithuanian citizens).
The estimates based on census-linked data suggest that there is a moderate health disadvantage among ethnic minorities in Lithuania. Previous study based on the same census-linked data showed that the ethnic mortality differentials remain significant even after compositional differences by education, marital status and urban–rural residence have been controlled for.18 It is possible that factors unobserved in routine statistical data play an important role. These may be health behaviours, though the contextual characteristics of regions with higher shares of ethnic minorities may also contribute to the differentials. While the mortality differences by ethnic group are considerably smaller than educational or marital status differentials, the existence of such a health divide is a matter of serious concern and should be addressed by appropriate policies.
The results of this study along with the fact that the issue of mortality differentials is central to public health call for producing better data for the assessment of differential mortality in Eastern Europe.
What is already known on this subject
There is inconsistent evidence about the importance of reporting bias in death-record-based data on socio-demographic status.
What this paper adds
The reporting bias found in the death records leads to substantial underestimation of mortality of ethnic minorities and incorrect evidence about the direction of mortality disparity by ethnicity in Lithuania.
We are grateful to Statistics Lithuania for providing us with high-quality census-linked data.
Funding Vlada Stankuniene and Dalia Ambrozaitiene are supported by a grant from the Research Council of Lithuania (SIN-14/2010).
Competing interests None.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.