ReviewA review of land-use regression models to assess spatial variation of outdoor air pollution
Introduction
A large number of epidemiological studies have shown that current day outdoor air pollution is associated with significant adverse effects on public health (Brunekreef and Holgate, 2002; Pope and Dockery, 2006). Pollutants of health concern at current day concentration levels in developed countries include particulate matter (PM), nitrogen dioxide (NO2) and ozone (Brunekreef and Holgate 2002). Time series studies have found that day-to-day changes in PM concentrations, in particular, are related to changes in hospital admissions and mortality (Katsouyanni et al., 2001, Samet et al., 2000). The relative risks in the time series studies are generally small: for example, in a large European study by Katsouyanni et al. (2001) mortality increased by 0.5% with an increase of 10 μg/m3 of the 24-h average concentration of PM10. In 1993 a prospective cohort study in six US cities documented an association between long-term average exposure to outdoor air pollution and reduced survival, after careful control for other individual risk factors such as smoking (Dockery et al., 1993). Mortality rates in the most polluted city were 26% higher than in the least polluted city; the difference in annual average PM2.5 concentration between these cities was 19 μg/m3. Several other studies have subsequently found associations between mortality from cardiovascular and respiratory diseases and long-term average exposure to air pollution (Pope and Dockery, 2006). In general, such long-term air pollution exposure studies have played an important role in recent health impact assessments and in the debate about new air quality guidelines for Europe (Kunzli et al., 2000).
Exposure assessment for epidemiological studies of long-term exposure to ambient air pollution remains a difficult challenge. The first cohort studies published in the mid-1990s have compared mortality rates between cities, with exposure characterized by the average concentration measured at a central site within each city (Dockery et al., 1993; Pope et al., 1995). In the past decade, various studies have documented significant variation of outdoor air pollution at a small scale within urban areas for important pollutants such as NO2 and black smoke (e.g. Fischer et al., 2000, Kingham et al., 2000, Lebret et al., 2000, Monn, 2001, Jerrett et al., 2005, Zhu et al., 2002). In some settings the within-city spatial contrast may be as large as the between-city contrast. There is evidence from epidemiological studies that within-city contrasts of particulate matter air pollution are associated with larger contrasts than between-city (Miller et al., 2007). Epidemiological studies therefore need to take these contrasts into account. Monitoring alone will generally not be feasible, as the study population of epidemiological studies generally comprises several hundreds to thousands of subjects, living or working at different places. An additional complication for monitoring is that only long-term (i.e. annual) average concentrations are useful for the epidemiological study, so that multiple daily or weekly samples have to be collected.
Current approaches that have been developed to meet the challenge of assessing intra-urban air pollution contrasts have recently been reviewed (Briggs, 2005, Jerrett et al., 2005). Approaches include the use of exposure indicator variables (e.g. traffic intensity at the residential address or distance to a major road), interpolation methods (e.g. kriging, inverse distance weighing), conventional dispersion models and land-use regression models. Application of the land-use regression approach for air pollution mapping was introduced in the SAVIAH (Small Area Variations In Air quality and Health) study (Briggs et al., 1997). Land-use regression combines monitoring of air pollution at a small number of locations and development of stochastic models using predictor variables usually obtained through geographic information systems (GIS). The model is then applied to a large number of unsampled locations in the study area. The technique was initially termed regression mapping (Briggs et al., 1997). Regression mapping is probably more descriptive of the methodology as the predictor variables are not only representative of land use. Other variables such as altitude and meteorology, for example, are often included in the models. As most researchers currently refer to the method as land use regression (LUR), however, we will also use this term. There are some earlier examples of the method in environmental science (Briggs et al., 1997). In 1985 interpolation of sulfate deposition data from the USA was supplemented with a drift term using geographical coordinates (Bilonick, 1985).
After the successful pioneering work in SAVIAH, LUR methods have been increasingly used in epidemiological studies in the past decade (Briggs, 2005). Developments in GIS have contributed to the popularity of LUR methods. Initially the approach was mainly adopted in Europe, but in the past few years several applications in North America have been published (e.g. Gilbert et al., 2005, Ross et al., 2006, Ross et al., 2007). While most studies have developed models that explain spatial air pollution contrasts satisfactorily, the predictive models differ substantially between the studies. Although this may be due to true differences between locations, we believe that differences in the application of the approach and selection of variables also play an important role.
The goal of this paper is therefore to review the various elements of the approach by discussing studies applying LUR methods. After listing the studies identified through a systematic review, we structure the review according to the main components of LUR: monitoring data, geographic predictors and model development and validation. We will compare the validity of the LUR models to alternative quantitative approaches especially dispersion modelling, and conclude with a discussion of limitations and new developments. A short review of LUR models has been published before (Ryan and LeMasters, 2007). The review identified six studies by a search through June 2006 and had a substantially narrower scope than the current manuscript.
We performed a systematic literature search in Pubmed and Science Direct to trace studies using land-use regression approaches. The final search was conducted on January 15 2008. We used the search terms “land use regression”, “GIS air pollution”, “regression mapping” and “air pollution stochastic”. This was supplemented by papers included in the reference lists of the traced papers and papers that were already known to us based upon previous exposure assessment and epidemiological studies. We only included papers in the English, German and Dutch language.
Section snippets
Identified studies
We identified 25 land-use regression studies. Table 1 lists some key characteristics of the design of the studies we identified. Table 2, Table 3, Table 4, Table 5 outline the performance of, and predictor variables for, the final LUR models. Most applications have been limited to nitrogen dioxide (NO2), largely because of the ease of monitoring of this pollutant (Table 2). Fewer studies have developed models for NO or NOx (Table 3), particulate matter (PM2.5) or the elemental carbon content of
Monitoring data
Studies differ in the monitoring data that are used to develop land use regression models. Important aspects are the use of routine versus purpose-designed networks, monitored pollutant, the number and distribution of monitoring sites and temporal resolution.
Predictor variables
Most studies have typically assessed a large set of potential predictor variables to model monitored concentrations. Frequently used predictor data include: traffic variables, population or address density, land use, altitude and topography, meteorology and location (Table 6). As an example, a study by Henderson et al. (2007) included 55 potential predictors and a study by Moore et al. (2007) examined 140 predictors. In the final models, typically only a small number of predictors are included.
Model development and validation
Most studies use standard linear regression techniques to develop prediction models. Forward, backward or best-subsets automatic selection methods are often applied to develop a parsimonious model from a large set of predictor variables that maximizes the percentage explained variability (R2). Following the approach used in the SAVIAH study, a priori definition of a required sign of regression slopes for specific variables (e.g. positive for traffic intensity) is used by some investigators in
Performance of models
Land-use regression methods have been applied to develop maps of NO2, NOx, PM2.5, the elemental carbon or soot content of PM2.5 and VOCs. For NO2, the percentage explained variation from the prediction model is typically about 60–70%. This is often achieved with only a few predictors, usually including variables representing traffic load, population density, altitude and land use in various representations (e.g. industrial or urban). Differences in prediction R2 between studies may be related
Limitations
While land use regression methods have, in many cases, been applied successfully to model spatial variation of ambient air pollution, there are several limitations of the method. First, LUR models have a limited ability to separate the impact of some priority pollutants. In the TRAPCA study, the correlation of modelled NO2, PM2.5 and soot was above 0.95 in the three study areas (Munich, Stockholm and three regions in the Netherlands), so that the potentially independent health effects of these
New developments
This section briefly outlines some innovations of the LUR methodology including expanding the scope of the predictor variables, new GIS approaches, and spatio-temporal models.
Rosenlund et al. (2008) evaluated the value of adding actual emission data to surrogate variables such as traffic intensity and population density to predict NO2 concentrations in the city of Rome. No improvement of the land use regression model was obtained after adding emissions of PM, NOx, CO and benzene available at
Conclusions
Land-use regression methods have generally been applied successfully to model annual mean concentrations of NO2, NOx, PM2.5, the soot content of PM2.5 and VOCs. The method has been applied in different settings, including European and North-American, non-industrial and industrial cities. The performance of the method in urban areas is typically better or equivalent to geo-statistical methods such as kriging and conventional dispersion models. Compared to dispersion models, the land use
Acknowledgments
The work is supported by grant RGI-137 from the Dutch program Ruimte voor Geoinformatie supported by the Ministry of Housing, Spatial Planning and the Environment.
References (75)
- et al.
The use of wind fields in a land use regression model to predict air pollution concentrations for health exposure studies
Atmos. Environ.
(2007) - et al.
Estimated long-term outdoor air pollution concentrations in a cohort study
Atmos. Environ.
(2007) The space–time distribution of sulfate deposition in the Northeastern United States
Atmos. Environ.
(1985)- et al.
A regression-based method for mapping traffic-related air pollution: application and testing in four contrasting urban environments
Sci. Total. Environ.
(2000) - et al.
Air pollution and health
Lancet
(2002) - et al.
Modeling annual benzene, toluene, NO2, and soot concentrations on the basis of road traffic characteristics
Environ. Res.
(2002) - et al.
The CAR model: the Dutch method to determine city air quality
Atmos. Environ.
(1993) - et al.
A study of the relationships between Parkinson's disease and markers of traffic-derived and environmental manganese air pollution in two Canadian cities
Environ. Res.
(2007) - et al.
Traffic-related differences in outdoor and indoor concentrations of particles and volatile organic compounds in Amsterdam
Atmos. Environ.
(2000) - et al.
Characterization of a spatial gradient of nitrogen dioxide across a United States – Mexico border city during winter
Sci. Total Environ.
(2005)