Background Little is known about how diet-related and activity-related amenities relate to residential location behaviour. Understanding these relationships is essential for addressing residential self-selection bias.
Methods Using 25 years (6 examinations) of data from the Coronary Artery Risk Development in Young Adults (CARDIA) study (n=11 013 observations) and linked neighbourhood-level data from the 4 CARDIA baseline cities (Birmingham, Alabama; Chicago, Illinois; Minneapolis, Minnesota; Oakland, California, USA), we characterised participants’ neighbourhoods as having low, average or high road connectivity and amenities using non-hierarchical cluster analysis. We then used repeated measures multinomial logistic regression with random effects to examine the associations between individual-level sociodemographics and neighbourhood-level characteristics with residential neighbourhood types over the 25-year period, and whether these associations differed by individual-level income.
Results Being female was positively associated with living in neighbourhoods with low (vs high) road connectivity and activity-related and diet-related amenities among high-income individuals only. At all income levels, a higher percentage of neighbourhood white population and neighbourhood population <18 years were associated with living in neighbourhoods with low (vs high) connectivity and amenities. Individual-level race; age; and educational attainment, neighbourhood socioeconomic status and housing prices did not influence residential location behaviour related to neighbourhood connectivity and amenities at any income level.
Conclusions Neighbourhood-level factors appeared to play a comparatively greater role in shaping residential location behaviour than individual-level sociodemographics. Our study is an important step in understanding how residential locational behaviour relates to amenities and physical activity opportunities, and may help mitigate residential self-selection bias in built environment studies.
- PHYSICAL ACTIVITY
- SOCIAL INEQUALITIES
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
A primary challenge of neighbourhood health research is that it is difficult to tease apart the influence of the neighbourhood on its residents from the fact that residents locate in neighbourhoods on the basis of health-related amenities. If unaccounted for, neighbourhood selection factors may bias associations of built environment factors and health outcomes.1 Therefore, to address potential residential self-selection bias, it is important to understand how access to health-related amenities influences residential behaviour.
The few studies that have addressed how diet-related and activity-related amenities relate to residential location behaviour have found a positive association between residential location with proximity and number of retail and physical activity (PA) facilities.2–4 There are also examples of self-reported preferences for living in neighbourhoods with lower intersection density and street networks,3 ,5 ,6 despite positive observed associations between road connectivity and PA.7
Furthermore, there is evidence to support that individuals' self-reported residential preferences are influenced by individual-level sociodemographics and neighbourhood characteristics, such as proximity to employment subcentres and accessibility of parks.8–14 However, these studies largely lack time-varying data and examine residential preferences (vs actual location behaviour), with limited geographic generalisability. Little is known about how the relationship between diet-related and activity-related amenities and residential location behaviour varies by individual-level income.
Using 25 years of time-varying data from the Coronary Artery Risk Development in Young Adults (CARDIA) study with linked neighbourhood-level data from four US cities, we sought to fill these gaps. We used repeated measures to estimate average associations between individual-level sociodemographics and neighbourhood-level characteristics of CARDIA participants with neighbourhood diet-related and activity-related amenities and infrastructure over time. Since income is a major factor in residential location behaviour,5 we hypothesised that these associations would differ by individual-level income.
CARDIA is a prospective study of the development and determinants of cardiometabolic outcomes in a sample of young adults. In 1985–1986, 5115 men and women aged 18–30 years were recruited from four US metropolitan field centres (Birmingham, Alabama; Chicago, Illinois; Minneapolis, Minnesota; Oakland, California, USA), with approximately equal enrolment by age (18–24, 25–30 years), race (black, white), gender and education (high school or less, more than high school). Follow-up examinations were conducted in 1987–1988 (year 2), 1990–1991 (year 5), 1992–1993 (year 7), 1995–1996 (year 10), 2000–2001 (year 15), 2005–2006 (year 20) and 2010–2011 (year 25), with participant retention of 91%, 86%, 81%, 79%, 74%, 72% and 72%, respectively. We used a restricted sample of CARDIA participants who remained in (or returned to) the four baseline cities at any given examination year (n=12 308 person-observations), for a total of 4316, 2462, 1728, 1481, 1202 and 1119 participants at baseline and examination years 7, 10, 15, 20 and 25, respectively. Compared with the full CARDIA sample, individuals in our analytic sample (ie, living in one of the four cities at each examination year) were younger and less white and there was a greater proportion of male participants.
Residential locations of participants were determined from geocoded home addresses at each examination year. We defined neighbourhoods using real estate-derived boundaries from Zillow15 where available (Chicago, Illinois; Oakland, California; Minneapolis, Minnesota, USA) and the Regional Planning Commission of Greater Birmingham (Birmingham, Alabama, USA; n=392 neighbourhoods at each examination year). Using a Geographic Information System (GIS), we geographically and temporally matched neighbourhood-level data to participants' residential locations. Neighbourhood features 23 m along the boundaries of adjacent neighbourhoods were assigned to each adjacent neighbourhood since they would theoretically be proximate to each other.
Cluster analysis to derive neighbourhood types
Our baseline sample of CARDIA individuals was geographically clustered due to targeted enrolment by age, gender, race and education, thus there were neighbourhoods in the four cities with zero or few participants. Therefore, we created a posteriori neighbourhood clusters using non-hierarchical cluster analysis to characterise neighbourhoods where sufficient numbers of individuals lived.
Since we were interested in understanding relationships between residential location behaviour and physical environment in the context of residential self-selection bias,1 we sought to capture neighbourhood characteristics related to diet and PA. Specifically, we used variables related to the count of food outlets and PA facilities (per km2), distance from participants' residence to the nearest employment centre centroid, road types and lengths, and total park area (km2) within each neighbourhood at each examination year (see online supplementary file 1).16 Theoretically, the distribution of these variables shapes diet behaviours and/or PA opportunities.7 ,8 ,17 ,18
We transformed all calculated variables used to define neighbourhood clusters into z-scores by examination year and city to achieve comparability across measures. Then using the PROC FASTCLUS procedure in SAS (V.9.3), we conducted cluster analyses using means of these standardised variables at each examination year and a range of 2–6 clusters. To determine a final cluster solution, we evaluated the distribution of individuals across clusters; differences in proportions across clusters; parsimony and meaningfulness of clusters. We classified values ≥0.5 or ≤−0.50 as high or low, respectively.19 We also evaluated whether clusters appeared repeatedly across solutions (robustness) by performing the cluster analysis many times. After determining the appropriate number of clusters from this step, we performed 1000 iterations of the cluster analysis using a SAS macro, which identified the cluster solution with the highest R2 value; we used this categorical variable of neighbourhood type as our outcome.
We adjusted for several neighbourhood-level sociodemographic confounders. We obtained data related to neighbourhood-level socioeconomic status (SES), racial composition, age structure and rental properties at the census tract level from the USA. US Census 1980, 1990, and 2000 and American Community Survey 5-year estimates from 2005 to 2009 and 2007 to 2011 (when comparable US Census data were unavailable). Using linear interpolation, we estimated a continuous change in sociodemographic characteristics across the full period of the decennial and quinquennial censuses. We also derived a neighbourhood socioeconomic deprivation score using principal components analysis of: (1) percentage of population with less than high school education at age 25; (2) percentage of population with at least a college degree at age 25; (3) median household income and (4) percentage of population with household income <150% of federal poverty level;20 a higher score indicates higher neighbourhood socioeconomic deprivation.
We calculated the average of quarterly values for home price index (HPI) in each city at each examination year using values from Moody's Analytics.21 Moody's provides Case-Schiller price data for Chicago, Minneapolis and Oakland for each quarter of each year from 1975 to 2012 relative to a value of 100 for the first quarter of the year 2000. Moody's provides Lender Processing Service data for Birmingham at the zip code level for each quarter from 1991 to 2012, measured in the thousands of dollars. We used multilevel mixed-effects linear regression (-mixed- in Stata V.13.0) to predict HPI values for 1985 in Birmingham (see online supplementary file 1).
Given that neighbourhood-level variables were available at different geographic levels, we harmonised all variables (including those used to define neighbourhood cluster types) to fit our socially constructed neighbourhood boundaries within city limits. We also created a geographically weighted estimate of neighbourhood-level variables when data source boundaries did not align with neighbourhood boundaries (assuming an equal distribution within source boundaries).
We also adjusted for several individual-level sociodemographic confounders. At each CARDIA examination year, a standardised questionnaire was used to collect self-reported individual-level sociodemographic characteristics, including age, gender, race and current educational attainment (highest grade or year of school completed). Income was collected with categorical responses at examination years 5, 7, 10, 15, 20 and 25; we substituted income values from examination year 5 for baseline values, which were unavailable.
We used repeated measures multinomial logistic regression with random effects to examine associations between individual-level sociodemographics and neighbourhood-level characteristics with neighbourhood type at each time period. Therefore, model coefficients represent the estimated average effect of a one-unit change in each sociodemographic characteristic on the probability of locating in one neighbourhood type relative to a referent neighbourhood type at each cross-section of time over the 25-year period.
Our individual-level exposures included age (continuous), race (black, white), gender and education (high school or less, more than high school); and our neighbourhood-level exposures included socioeconomic deprivation score, percentage of non-Hispanic white population, percentage of the population ≤18 years and the percentage of neighbourhood rental properties (occupied and vacant). We also adjusted for neighbourhood land area, examination year and study centre. Based on evidence showing that model estimates may improve with interaction terms for income groups,12 we stratified our analyses by tertiles of individual-level income (coded as the midpoint of the categorical response).
We used Stata (V.13.0) for all analyses (-mlogit-) with the -suest- postestimation command to obtain a joint covariance matrix for all estimated coefficients. Then we used the -test- command to determine whether coefficients (ie, estimated effect of exposures on the probability of residing in each neighbourhood type) were equal across tertiles of individual-level income. We accounted for clustering by neighbourhood ID using the ‘cluster’ option.
To quantify changes in the neighbourhood type in response to changes in each exposure, we used the -margins- postestimation command with the -predict- option to predict the probability of residing in each neighbourhood type at fixed levels of the covariates (categories or ±1SD of mean) within each income tertile.
To account for potential selection bias due to out-migration from the four cities over time, we used a probit model to derive inverse probability weights. We used gender, race and baseline study centre to predict the probability of being in the sample at year 25, and used the inverse of the probability to weight the models in the central analysis (-pweight-).
Given empirical evidence of the importance of housing price in residential behaviour,12 we ran two separate models: a model with neighbourhood clusters stratified by high and low HPI (<50th and ≥50th centile, respectively) and a model with non-stratified clusters. We used a likelihood-ratio test to assess the fit of these two models.
Compared with baseline, the analytic sample at the end of follow-up was less white, older and more educated, with a greater proportion of female participants (table 1).
Our final cluster solution included three distinct clusters, with 545 (23.2%), 409 (17.4%) and 1398 (59.4%) neighbourhoods assigned to clusters with low, average, and high road connectivity and activity-related and diet-related amenities, respectively (see online supplementary file 2). The low cluster type was characterised by a higher total road length, park area and count of cul-de-sacs, with greater distance to employment subcentres; lower intersection density and a lower count of PA facilities and convenience stores (see online supplementary file 3). In contrast, the high cluster had a higher intersection density, count of road links and β-index, with a higher count of PA facilities and all food outlets; a lower percentage of local roads and closer proximity to employment subcentres. The fit of the model with fewer clusters was statistically significantly better than the model with clusters divided by high/low HPI (p<0.05); thus, we included HPI as a covariate rather than stratify clusters in the final model.
Across all tertiles of individual-level income, individual-level race, educational attainment and age were not statistically significantly associated with any of our derived residential clusters over time (table 2). Whereas, the probability of residing in a neighbourhood with low (vs high) road connectivity and activity-related and diet-related amenities was higher for females (vs males) within the high-income tertile only (table 3).
At all levels of individual-level income, the probability of residing in a neighbourhood with low (vs high) amenities/connectivity increased as the percentage of neighbourhood white population increased and as the percentage of neighbourhood population ≤18 years increased. Regardless of income level, we did not observe statistically significant associations between neighbourhood socioeconomic deprivation and HPI with residential neighbourhood type. However, the probability of residing in the low (vs high) amenities/connectivity neighbourhood type decreased as the percentage of rental housing units increased for all participants.
Overall, findings for medium-income participants were similar to low-income participants (relative to high-income participants). However, the probability of residing in the low (vs high) connectivity/amenities neighbourhood type was not statistically significantly associated with the percentage of neighbourhood white population among medium-income participants.
Using 25 years of retail and built environment data, we examined relationships between individual-level sociodemographics and neighbourhood-level characteristics with living in activity-supportive, commercially dense neighbourhoods over time, and whether these associations differed by individual-level income. We found that individual-level race, age, and educational attainment, and neighbourhood SES and housing prices were not associated with residential location over time whereas, individual-level gender was a significant predictor of neighbourhood residential type among high-income CARDIA participants only. Neighbourhood racial and age composition and percentage of rental properties were also related to residential location behaviour at all income levels.
Although previous literature suggests that individual-level race, age and educational attainment influence residential preferences,12 ,13 ,17 ,22 we did not observe similar findings with objectively measured residential location in our study across follow-up. These inconsistencies may be due to the use of preference data in previous studies, which assumes that preference measures capture true preferences and residential movement.1 We also found that high-income female participants were more likely to reside in areas with low (vs high) connectivity/amenities, but we did not observe similar associations among low-income or medium-income female participants. Although research indicates that women prefer to live in more compact neighbourhoods,13 high-income women may be able to afford to own a car and drive to destinations, and thus may have sufficient income to choose to reside in less urban areas.23
Participants of all income levels were more likely to reside in neighbourhoods with low (vs high) connectivity/amenities as the percentage of population <18 years increased. This finding is supported by several studies showing that households with school-age children tend to live in less densely populated, suburban areas,14 ,22 and prefer to locate near other households with families.12 Those with families may also seek larger houses, which tend to be more affordable in suburban (vs urban) areas. Similarly, the percentage of neighbourhood white population was positively associated with locating in neighbourhoods with low (vs high) connectivity/amenities. This finding is consistent with previous work indicating that suburban white households tend to be located in neighbourhoods with better conditions (eg, fewer abandoned buildings).24
Finally, we found that housing price was not related to residential neighbourhood type at any income level, but the percentage of rental housing units was negatively associated with the locating in neighbourhoods with low (vs high) amenities and connectivity. Although the latter is consistent with research showing that renters are less likely to live in less commercial, suburban neighbourhoods,22 we expected lower income individuals to be more sensitive to housing prices11 and to locate near amenities if housing was more affordable. However, our indices did not include prices of apartments or multifamily dwellings,21 and thus may not have accurately reflected the housing market for low-income individuals.
Based on these preferences, our findings also suggest that residential location behaviour may bias estimates of the relationship between the physical environment and health outcomes. Hypothetically, individuals living in neighbourhoods with greater PA opportunities are more likely to be physically active, and individuals who live in areas with a high density of eating-out options may have poorer dietary behaviours. Therefore, future studies should employ methodological approaches, such as statistical control and complex econometric approaches,25 to account for these potential sources of bias.
Studies that examine the influence of the food and built environment on residential location behaviour are mostly based on self-reported preference surveys2 ,26 and cross-sectional data.2 ,26 ,27 In contrast, we had access to detailed, time-varying measures, which allowed us to use information both within and between participants, thus producing more efficient results and reducing measurement error. We used actual residential location data versus preferences, the latter of which is subject to social desirability bias.28 Our cluster analysis approach also provided distinct, robust and meaningful groups of neighbourhood types, which may be generalisable to other urban areas. For example, the low amenities/connectivity neighbourhood type was characterised by greater land area and greater park size, which is consistent with previous work.29 The distribution of individuals across neighbourhood types also reflected the increasingly urban nature of neighbourhoods in the USA.30
Our study had several limitations, including a lack of data related to crime, school quality, car ownership or location of participants' actual (vs nearest) employment. We did not adjust for population density due to collinearity with our explanatory variables, and it is possible that much of neighbourhood selection could have been predicted by urban versus suburban location. With the exception of food outlet business records, we could not evaluate the accuracy of our commercial data sets nor could we validate our clusters due to deductive disclosure. Finally, we did not know the extent to which participants were constrained to live in areas due to observed or unobserved factors (eg, discrimination); however, stratifying our models by individual-level income may have mitigated constraints related to affordability. With the exception of gender, it is also possible that associations between individual-level sociodemographics and neighbourhood characteristics with residential neighbourhood type differed by individual-level income level, but the magnitude of estimated effect may have been too small to detect in our sample.
Using time-varying data from the CARDIA study and four urban cities, we found that neighbourhood racial and age composition and the percentage of rental properties were meaningful predictors of residential location type over time; but we did not observe similar findings with individual-level race; age; and educational attainment, neighbourhood SES and housing prices. Our findings also showed that relationships between individual-level gender and residential neighbourhood type were stronger for high-income individuals. Overall, neighbourhood-level factors appeared to play a comparatively greater role in residential location behaviour related to the food and built environment than individual-level sociodemographics. Our study is an important first step in identifying how residential self-selection may bias estimates of the effects of the food and built environment with health outcomes, and how this dynamic may differ across income levels.
What is already known on this subject
Understanding the influence of the food environment and physical activity amenities on residential location behaviour is essential for addressing residential self-selection bias.
However, not much is known about how the food and built environment influences residential location behaviour, and how associations may differ by income status.
Previous studies lack detailed, time-varying measures describing road connectivity and diet-related and activity-related amenities in relation to actual residential location behaviour.
What this study adds
Being female was positively associated with living in neighbourhoods with low (vs high) road connectivity and activity-related and diet-related amenities among high-income individuals, whereas individual-level race, age and education was not associated with residential location behaviour at any income level.
Neighbourhood age and racial composition and the percentage of rental properties, but not neighbourhood socioeconomic status and housing prices, were associated with residential location behaviour related to the food and built environment at all income levels.
Our study is an important step in understanding how residential location behaviour related to amenities and physical activity opportunities is influenced by sociodemographic characteristics of the population, and may help mitigate residential self-selection bias in the food and built environment literature.
The authors would like to acknowledge CARDIA chief reviewer Kiarri Kerhsaw, PhD, whose thoughtful suggestions improved the paper, as well as Marc Peterson, of the University of North Carolina, Carolina Population Center (CPC) and the CPC Spatial Analysis Unit for creation of the environmental variables. PER, MPH had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of data analysis.
Contributors PER had full access to the data and takes responsibility for the integrity of the data and the accuracy of the data analysis. PER performed the statistical analysis, and analysed and interpreted the data. PER and PG-L drafted the article. DKG, JMS, JPR and PG-L critically revised the article for important intellectual content. PG-L acquired the data, obtained the funding, approved the final draft of the article and supervised the study. All authors contributed to the study concept and design.
Funding This work was funded by the National Heart, Lung, and Blood Institute (NHLBI) R01HL104580. The Coronary Artery Risk Development in Young Adults Study (CARDIA) is supported by contracts HHSN268201300025C, HHSN268201300026C, HHSN268201300027C, HHSN268201300028C, HHSN268201300029C and HHSN268200900041C from the NHLBI, the Intramural Research Program of the National Institute on Aging (NIA), and an intra-agency agreement between NIA and NHLBI (AG0005). The authors are grateful to the Carolina Population Center, University of North Carolina at Chapel Hill, for general support (grant P2C HD050924 from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD)), the Nutrition Obesity Research Center (NORC), University of North Carolina (grant P30DK56350 from the National Institute for Diabetes and Digestive and Kidney Diseases (NIDDK)), and to the Center for Environmental Health Sciences (CEHS), University of North Carolina (grant P30ES010126 from the National Institute for Environmental Health Sciences (NIEHS)).
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.