Choosing area based socioeconomic measures to monitor social inequalities in low birth weight and childhood lead poisoning: The Public Health Disparities Geocoding Project (US)
- Correspondence to: Professor N Krieger, Department of Health and Social Behavior, Harvard School of Public Health, 677 Huntington Avenue, Boston, MA 02115, USA;
- Accepted 14 June 2002
Study objectives: To determine which area based socioeconomic measures can meaningfully be used, at which level of geography, to monitor socioeconomic inequalities in childhood health in the US.
Design: Cross sectional analysis of birth certificate and childhood lead poisoning registry data, geocoded and linked to diverse area based socioeconomic measures that were generated at three geographical levels: census tract, block group, and ZIP code.
Setting: Two US states: Massachusetts (1990 population=6 016 425) and Rhode Island (1990 population=1 003 464).
Participants: All births born to mothers ages 15 to 55 years old who were residents of either Massachusetts (1989–1991; n=267 311) or Rhode Island (1987–1993; n=96 138), and all children ages 1 to 5 years residing in Rhode Island who were screened for lead levels between 1994 and 1996 (n=62 514 children, restricted to first test during the study period).
Main results: Analyses of both the birth weight and lead data indicated that: (a) block group and tract socioeconomic measures performed similarly within and across both states, while ZIP code level measures tended to detect smaller effects; (b) measures pertaining to economic poverty detected stronger gradients than measures of education, occupation, and wealth; (c) results were similar for categories generated by quintiles and by a priori categorical cut off points; and (d) the area based socioeconomic measures yielded estimates of effect equal to or augmenting those detected, respectively, by individual level educational data for birth outcomes and by the area based housing measure recommended by the US government for monitoring childhood lead poisoning.
Conclusions: Census tract or block group area based socioeconomic measures of economic deprivation could be meaningfully used in conjunction with US public health surveillance systems to enable or enhance monitoring of social inequalities in health in the United States.
- geocoding and geographic information science (GIS)
- socioeconomic inequalities
- public health surveillance
Despite the well known existence of socioeconomic inequalities in childhood health within the United States,1 efforts to monitor trends in these disparities are hampered by two problems. The first is the scant data available on socioeconomic position in US public health surveillance systems.2 US birth certificates, for example, contain only data on parents’ educational level2, 3 and no socioeconomic data are included in public health data systems to monitor childhood lead poisoning.4, 5 Substantial research indicates, however, that not only is material deprivation an important determinant of both low birth weight and increased blood lead concentrations in children,3–6 but also that different economic indicators may vary in their associations with these and other health outcomes.7–11 The implication is that, without adequate socioeconomic data, US public health monitoring systems are compromised in their ability to track disparities and trends in childhood health.
One solution to the paucity or absence of socioeconomic data in US public health surveillance systems gaining increasing recognition involves geocoding and use of area based socioeconomic measures.7, 12, 13 Of note, these area based measures can be conceptualised as meaningful indicators of socioeconomic context in their own right and not merely “proxies” for individual level data, providing information on not only the area’s residents (its composition) but also area level characteristics not reducible to the individual level (for example, concentration of poverty, absence of a nearby clinic, adjacency to a toxic waste site, respectively understood as contextual, public goods, and environmental attributes).7, 12–14 Especially relevant to children’s health, these measures can likewise be equally applied, unlike individual level educational or occupational measures, to all persons, regardless of age, gender, and employment status.7, 12
This potential solution, however, raises a second problem: a lack of consensus as to which kinds of area based socioeconomic measures, obtained at which level of geography, are meaningful for monitoring socioeconomic inequalities in health.7, 8, 13, 14 Recent US studies on low birth weight and related birth outcomes, for example, have used an eclectic array of single indicator area based measures (for example, poverty rate, per capita income, percentage unemployed, percentage of adults with less than a high school education, crowding, and ratio of home owners to renters15–20) plus assorted indices (for example, summed z scores combining data on median family income and education16), all variously measured at the level of the census block group,18 census tract,15, 17 ZIP code,20 and larger “community areas” defined by local health departments.16, 19 Similar inconsistencies regarding type and geographical level of area based socioeconomic measures are evident in studies on population screening of children’s blood lead concentrations.5, 11, 21, 22 Although a plurality of measures may be useful for aetiological research, in the case of monitoring, such heterogeneity impedes comparing results across studies and across outcomes, let alone tracking changes over time.
This study, concerning low birth weight and childhood lead poisoning, accordingly constitutes one component of our Public Health Disparities Geocoding Project, designed to investigate which area based socioeconomic measures, at which levels of geography, can meaningfully be used for monitoring socioeconomic disparities in US health, across a wide variety of health outcomes.25 A priori criteria for assessing the area based socioeconomic measures under investigation were: (a) external validity (do the measures find gradients reported in the literature?), (b) robustness (do the measures detect expected gradients across a wide range of outcomes?), (c) completeness (are the measures comparatively unaffected by missing data?), and (d) user friendliness (how easy are the measures to understand and explain?). Guided by both an ecosocial framework23 and our previous empirical research,12, 13, 24, 25 we hypothesised that area based measures reflecting economic deprivation would exhibit the most pronounced socioeconomic gradients in health, compared with measures of affluence, with effects more consistent at the block group and census tract, as compared with ZIP code, level.
The study base consisted of the populations and areas in two US states, Massachusetts (MA) and Rhode Island (RI), for the calendar period surrounding the 1990 census26, 27 (table 1). Birth certificate data were provided by both the Massachusetts Department of Public Health (MDPH) and the Rhode Island Department of Health (RIDOH); childhood lead screening data were provided by RIDOH only. Use of these data was approved by all relevant Institutional Review Boards/Human Subjects Committees at the Harvard School of Public Health, MDPH, and RIDOH, with the study conforming to the principles embodied in the Declaration of Helsinki. Geocoding of these databases to the block group, census tract, and ZIP code levels was conducted by a commercial geocoding firm selected for its accuracy (determined to be 95% in a pilot study conducted for this project).28
Birth certificate data were obtained for all births to mothers who were residents of either Massachusetts (1989–1991; n=267 531) or Rhode Island (1987–1993; n=97 383) at the time of birth. Analogous to government reports, this excluded (for MA and RI combined) births to mothers under age 15 or older than age 55, and we also excluded births with missing birth weight (n=212) or weighing <150 g (n=970), multiple births including five or more births (n=324), and births that were not geocoded (n=2001); the final analytical dataset thus included 267 311 MA births and 96 138 RI births. Following standard conventions, low birth weight was defined as births <2500 g.1, 3 Mother’s race/ethnicity and educational level were obtained by self report on the birth certificate, using closed format questions.
Data on blood lead concentrations from the Rhode Island lead surveillance system, mandated in January 1993,21 were obtained for all tests (n=135 567) performed on children aged 1 to 5 years who were residents of Rhode Island and screened between 1994 and 1996. The final analytic dataset included 62 514 cases, after excluding cases: geocoded to outside of RI (n=20); under 12 or over 60 months old (n=30 372); missing lead data (n=1741); and repeat tests (n=40 920), removed to minimise the likelihood of artificially attenuating relations between socioeconomic position and risk of increased blood lead concentrations due to treatment after initial high values. Blood specimens were obtained two ways, at the screening physician’s discretion: venous and capillary/fingerstick. Because the second method may be subject to contamination (for example, lead dust on the pricked finger),4, 21 we analysed the two types of samples separately (venous: n=31 076; capillary/fingerstick =31 438). All blood lead specimens were analysed at the RIDOH Laboratory using the Graphite Furnace Atomic Absorption Spectrophotometry Method,29 which has a minimum detection level of 1 μg/dl. The specimens were kept refrigerated until analysed. Following guidelines issued by the Centers for Disease Control and Prevention (CDC) in 1997,4 raised blood lead concentrations were defined as ≥10 μg/dl.
We obtained 1990 census data for census tracts and block groups from US Bureau of Census Summary Tape File 3A and ZIP code data from Summary Tape File 3B.30 The block group, a subdivision of the census tract, is the smallest geographical census unit for which census socioeconomic data are tabulated and on average contains 1000 people31 (page G-6). A census tract, in turn, on average contains 4000 persons and is defined by the US Bureau of Census to be “small, relatively permanent statistical subdivision of a county ... designed to be relatively homogeneous with respect to population characteristics, economic status, and living conditions”.31 (pages G-10, G-11). ZIP codes, by contrast, on average contain 30 000 persons and are “administrative units established by the United States Postal Service . . . for the most efficient delivery of mail, and therefore generally do not respect political or census statistical area boundaries”.30 (page A-13 )
Three considerations guided our development of area based measures of socioeconomic position (SEP): (a) a priori conceptual definitions of SEP and social class7; (b) US and UK evidence emphasising detrimental effects of material deprivation on health1, 32–36; and (c), the need for measures that can be meaningfully compared over time and space, so as to permit valid monitoring and contrasts in relation to time period and region.7, 13, 25, 37, 38 As shown in table 2, the 11 single variable and eight composite area based socioeconomic measures we generated meeting these criteria, at each level of geography for each state, reflected six domains of SEP 7, 8, 39, 40: occupational class, income, poverty, wealth, education, and crowding, premised on the understanding that social class, as a social relationship, fundamentally drives the distribution of these manifest aspects of SEP.7, 25 Additionally, in accordance with CDC guidelines for targeting childhood lead screening,4 we also created a variable pertaining to the percentage of housing units built prior to 1950.
Among the composite variables, two were US analogues of the UK Townsend38, 41, 42 and Carstairs37, 43 deprivation indices, one used the algorithm for the US Center for Disease Control and Prevention’s “Index of Local Economic Resources”,44 and five were created exclusively for our study.13, 25 To mirror the skewed population distribution of socioeconomic resources, “SEP1” and “SEP2” simultaneously combined categorical data on poverty, working class, and either wealth or high income. “Factor 1” and “Factor 2” were generated by factor analysis using a maximum likelihood approach44, 45 applied to 15 census variables (see footnote to table 2), using the rank values of the census data, rather than impose arbitrary transformations to normalise their often considerably skewed distribution, with tied values assigned an average rank. The two factor model was selected as the most appropriate description of the underlying factor structure, with correlations between the factors ranging from 0.420 to 0.564 after oblique rotation. Finally, the “SEP index”, a standardised z scores akin to the Townsend index, was generated using inputs identified by the factor analysis. Cut off points for categorical area based socioeconomic measures were based on both their centile distribution (for example, quintiles) and a priori considerations (for example, the federal definition of “poverty areas” as regions where ≥20% of the population is below the US poverty line47, 48).
Our analytic plan involved five steps. In Step (1), we assessed the distribution and missingness of both the health and census data. In Step (2), we calculated the proportion of births that were low birth weight and the proportion of children screened who had high lead concentrations, stratified by the area based socioeconomic measures at each level of geography for each state. In Step (3), we visually inspected and quantified the socioeconomic gradients detected for each outcome at each level of geography using each area based socioeconomic measure, comparing outcomes for infants and children residing in areas with the least and most resources. As measures of effect, we calculated both the odds ratio (OR) and the relative index of inequality (RII). The RII provides a slope estimate of the risk estimate (for example, OR) across the full range of the distribution of the determinant, taking into account the population size of each stratum, thereby permitting meaningful comparison of gradients across different socioeconomic measures.49–51 For the birthweight analyses in Step (3), we analysed both data for all births and also for singleton births only, given the growing proportion of multiple births (at increased risk for being low birth weight, regardless of SEP) among older and more affluent women who became pregnant using in vitro fertilisation techniques.52 Because both sets of analyses yielded similar results, we report only data for singleton births (data not shown available upon request). In Step (4), we restricted analyses to persons geocoded to all three levels of geography; because results were equivalent to those obtained in Step (3), we report only the former, and do not include equivalent variables. For example, given similar results, we present data for only the categorical and not quintile version of the poverty data and omit the variable for low income (data not shown available upon request).
In Step (5), we then summarised findings across socioeconomic measures within and across levels of geography, in relation to our a priori criteria pertaining to external validity, robustness, completeness, and user friendliness. As an additional check on external validity, we also analysed the birth data in relation to mother’s educational level and restricted these analyses, to be compatible with US government reports, to mothers aged 20 and older.1 Also relevant to standard public health practice, we ascertained patterns of lead poisoning in relation to both the CDC’s recommended area based screening characteristic (≥27% of housing built before 19504) and the poverty measure, separately and combined. All analyses were conducted in SAS.53
Low birth weight (<2500 g) births comprised 5.8% and 6.3% of births, respectively, in Massachusetts and Rhode Island during the study interval (table 1). Increased blood lead concentrations (≥10 μg/dl) were detected among 17% and 22% of Rhode Island children ages 1 to 5 years old who underwent, respectively, venous and capillary/fingerstick tests (table 1).
Geocoding of births was successfully accomplished to the block group level for over 95% of records in Massachusetts and Rhode Island, to the census tract level for 100% of records for both states, and to the ZIP code level for 100% of the Massachusetts and 99.9% of the Rhode Island records (table 1). These results were independent of mother’s age, race/ethnicity, and educational level, and also birth weight of the infant. For lead screening, the overall percentage of children geocoded did not vary by age or gender and equalled 94%–95% for the venous and capillary tests; modest variation, however, occurred by race/ethnicity, type of lead test, and lead concentration (table 1). Among records geocoded to the ZIP code level, 8.1% of Massachusetts births, 2.5% of Rhode Island births, and 1.9% of the childhood lead poisoning records could not be linked to 1990 census data because their ZIP codes were either for non-residential sites or else were created or changed after the 1990 census. Less than 1% of areas were missing data on the specified area based socioeconomic measures; the one exception was for measures with data on wealth (affecting only 2% to 4% of areas; data not shown and available upon request).
As table 3 illustrates, in the case of low birth weight, both choice of area based measure and level of geography mattered. Firstly, considering singleton births in Massachusetts (table 3), the strongest gradients (OR ≥ 2.0; RII ≥ 2.5) were observed for census tract and block group measures of economic deprivation (poverty, median household income, Townsend index, Index of Local Economic Resources, SEP index and SEP1); the weakest gradient occurred for wealth (OR ¬1.3–1.4; RII ¬1.8). Effect estimates detected with ZIP code measures exhibited similar patterns but slightly lesser magnitudes. Equivalent patterns were evident for Rhode Island (table 3). In both states, moreover, these effect estimates were similar to those based on individual level educational data, comparing births to mothers with less than a high school education versus college graduates (MA: OR=1.90, 95% CI 1.80 to 2.01; RI: OR=2.03, 95% CI 1.86 to 2.22).
Similar patterns occurred for the lead data, with census tract and block group measures of economic deprivation detecting socioeconomic gradients either captured to a lesser extent or not at all by the other measures (table 4). Of note, socioeconomic gradients were steeper for the venous compared with capillary specimens. This occurred because a higher proportion of children living in areas with more resources were identified as having high lead concentrations by the capillary compared with the venous test (possibly reflecting greater contamination of the capillary samples), whereas for children in poorer areas, both tests identified a similarly high proportion of children having raised lead concentrations.
Summarising key aspects of these analyses, figures 1A–1C and 2A–2C depict, respectively, socioeconomic gradients for singleton births in Massachusetts and for increased blood lead concentrations (venous) in Rhode Island using the census tract versions of the three measures that most consistently detected socioeconomic gradients in health while differently delimiting the population at risk: poverty (single variable, categorical), Townsend index (composite, quintile), and SEP1 (composite, categorical).
Finally, in light of CDC recommendations that lead screening be targeted to areas where ≥27% of housing was built before 1950,4, 22 table 5 provides data on the odds of children having increased blood lead concentrations in relation to both this housing measure and the poverty measure, for all three levels of geography. Of note, both the poverty and housing measures detected populations at increased risk, and their combined effect was at least or more than additive (OR for poor area with old housing: 11 to 14 for venous, and 5 to 6 for capillary).
This study, a component of the first systematic US investigation evaluating diverse area based socioeconomic measures within and across multiple levels of geography for outcomes spanning from birth to death,25 provides evidence that, in the case of childhood health, both choice of measure and level of geography matter. Specifically, analysing data pertaining to low birth weight and childhood lead poisoning for two New England states in the period around the 1990 census, we found that area based measures of economic deprivation typically detected larger socioeconomic gradients than area based measures of education, occupation, or wealth. Moreover, census tract and block group level area based socioeconomic measures consistently detected equivalent and typically stronger socioeconomic gradients than their ZIP code level counterparts. Lastly, categories based on quintiles and a priori cut off points detected similar socioeconomic gradients; only the latter, however, were identically delimited across levels of geography within and across states.
Of note, diverse sources of error and bias could have affected our findings, albeit in ways unlikely to yield overestimates of socioeconomic gradients in health. If, for example, under-registration or misclassification of cases (with respect to outcomes) were non-differential with respect to poverty, estimates of effect would have been less precise. Alternatively, if these errors were positively associated with poverty or if persons subject to socioeconomic deprivation were less likely to have a geocodable address (for example, a PO box), as may have occurred among small number of children with very high lead concentrations (≥20 μg/dl) detected by the venous test, the ensuing conservative bias would have led to underestimation of socioeconomic gradients in health. These types of error, however, would have affected analyses at each level of geography and thus would not invalidate comparison of socioeconomic gradients across socioeconomic measures and across levels of geography. Additionally, we minimised geocoding error by using a firm whose accuracy we had previously validated28 and further note that the extremely low proportion of areas without data on the area based socioeconomic measures would have little impact on our analyses.
Two additional concerns are relevant to delineating area based socioeconomic measures. As in the case of individual level measures of socioeconomic position, debate exists over the benefits and drawbacks of using: (a) single variable indicators versus composite area based indicators,7, 8, 38, 41–43 and (b) continuous versus categorical data and, if categorical, what cut off points should be used.7, 8, 41–43 In our study, we tackled these issues empirically, by using diverse single variable and composite socioeconomic measures, with cut off points based on both centile distribution and a priori considerations. Of note, we found that estimates of effects detected using the single variable measure of poverty were similar to those based on composite measures, whether modelled based on a priori categorical cut off points or as quintiles.
Additional issues involving temporal and spatial scale and analytical level of our study also merit consideration. From an aetiological perspective, likelihood of temporal misclassification of SEP in relation to the birth and lead outcomes under study was likely to have been minimised because these outcomes occurred during the same time period as the census from which we derived our area based socioeconomic measures. Also of temporal significance, all of our study’s area based socioeconomic measures can meaningfully be compared across decennial censuses, a necessary attribute for monitoring socioeconomic trends over time.54 Regarding spatial considerations, “ecologic fallacy”, an often raised concern,7, 8, 14, 55 is not relevant to our study design. A form of aggregation bias, this fallacy arises when solely aggregate data are analysed and confounding is introduced by the grouping process generating the aggregate data. By contrast, in our study, individuals constituted the unit of observation for both the dependent variables (health outcomes) and the independent variables (living in an area with certain sociodemographic characteristics). Thus, in this approach, validity of using area based socioeconomic measures depends on the extent to which areas constitute meaningful geographical units,12, 56 as is more likely to be the case for census tracts and block groups compared with ZIP codes.7, 22, 31
Analyses conducted for this first phase of our project did not, however, investigate spatial correlation (for example, nesting of block groups within tracts), issues of adjacency (for example, effects of living in a poor block group adjacent to chiefly poor versus more affluent block groups), or—in the case of the birth data—the combined impact of individual level education and area based socioeconomic position on risk of low birth weight. Extant evidence, despite some inconsistencies, nevertheless indicates that use of standard techniques (for example, random effects models) to adjust variance estimates for spatial correlation would likely have yielded more conservative estimates of statistical significance while not substantively changing the estimates or patterns of associations themselves.14, 57–59 Had analyses taken into account issues of adjacency however, different and additional effect estimates might have been obtained.57–59 Moreover, multilevel analyses of birth outcomes using individual level and area based socioeconomic data have shown evidence of both independent and interactive effects,15, 18, 19, 60 implying that analyses only using area based measures are unlikely to capture the full impact of socioeconomic position on health.
Interpretation and implication of findings
Adding plausibility to our findings are the similar results for our Public Health Disparities Geocoding Project’s analyses of mortality rates and cancer incidence—whereby census tract and block group measures of economic deprivation consistently detected the strongest socioeconomic gradients, whereas ZIP code level measures yielded attenuated or sometimes even contrary effect estimates.25 The small number of US epidemiological studies investigating type and level of area based socioeconomic measures, albeit for fewer outcomes and fewer levels of geography, have likewise reported analogous results. 12, 18, 24, 61–69
Moreover, in the case of low birth weight and childhood lead poisoning, among eight recent US studies using area based socioeconomic measures, the two presenting data on geocoding success rate5, 18 reported geocoding a comparably high proportion of cases to the census tract and block group level (none of the eight, however, provided no information on the accuracy of the geocoding). Additionally, both US and European studies of low birthweight using area based socioeconomic measures have detected socioeconomic gradients of a similar magnitude (about a twofold increase in risk).5, 11, 15–19, 21, 60, 70–73 Effect estimates on the order of a 1.5-fold to twofold increased risk of low birth weight have likewise been reported by studies using individual level or household level data on social class, education, and related socioeconomic measures.3, 9, 10, 60, 73 Similarly, the magnitude of increased risk detected in our study for raised blood lead concentrations is similar to that reported in previous studies using both area based11, 21 and individual level socioeconomic measures.74
Further bolstering use of area based socioeconomic measures at the tract and block group, compared to ZIP code, level is the recent decision of the US Census Bureau to no longer include ZIP codes in the year 2000 census.75–77 Prompting this decision were difficulties in defining ZIP codes’ physical boundaries plus the temporal instability of these boundaries. Instead, the US Census Bureau opted to create a new geographical entity, the ZIP Code Tabulation Area (ZCTA), with boundaries coterminous with census blocks.75, 76 Importantly, individuals’ postal (mailing) ZIP code may not be the same as their ZCTA, meaning that postal ZIP codes recorded in many public health surveillance systems cannot validly be linked to the ZCTA data.75–77
In summary, then, drawing upon our a priori criteria regarding external validity, robustness, completeness, and user friendliness, and buttressed by our similar findings for mortality and cancer incidence,25 we offer a preliminary recommendation pertaining to geocoding of US public health surveillance systems, pending our analyses of tuberculosis, sexually transmitted infections, and violence. Specifically, we suggest that records should be geocoded to the tract or block group level and be linked to easily understood poverty related measures, demarcated by meaningful a priori categorical cut off points.
We thank Dr Daniel Friedman (Assistant Commissioner, Bureau of Health Statistics, Research and Evaluation, Massachusetts Department of Public Health) and Dr Jay Buechner (Chief, Office of Health Statistics, Rhode Island Department of Health) for facilitating conduct of this study using data from their respective health departments and for their helpful comments on our manuscript. We likewise thank the following health department staff for their contributions to our accessing the birth and lead registry data: (a) Massachusetts Department of Public Health: Alice Mroszczyk, Program Coordinator for 24A/B/111B Review Committee; Registry of Vital Records and Statistics: Elaine Trudeau, Registrar of Vital Records; Charlene Zion, Public Information Office; and (b) Rhode Island Department of Health: Vital Statistics: Roberta Chevoya, State Registrar of Vital Records; Environmental Health Risk Assessment: Susan Feeley, Public Health Epidemiologist (no longer at RIDOH); Childhood Lead Poisoning Prevention: Magaly Angeloni, Program Manager.
Funding: this work was funded by the National Institute of Child Health and Human Development (NICHD), National Institutes of Health (1 R01 HD36865-01); principal investigator Nancy Krieger, PhD, Department of Health and Social Behavior, Harvard School of Public Health.
Conflicts of interest: none.