Article Text
Statistics from Altmetric.com
Considering both distribution and determinants of health
In this issue of the journal Jennifer Ahern et al1 present the results of a multilevel analysis showing the increased likelihood of preterm birth affecting both African American and white women living in a neighbourhood with deprived socioeconomic characteristics. This increased risk was independent of individual cigarette smoking and modified by socioeconomic characteristics of the women. The authors, taking a multilevel perspective, concluded that examining both neighbourhood and individual socioeconomic factors in combination with behavioural and biological factors is the most adequate way to study determinants of preterm delivery.
MULTILEVEL ANALYSIS WITHOUT MULTILEVEL REGRESSION— IS THE INTRA-NEIGHBOURHOOD CORRELATION* A NUISANCE?
The study of Ahern et al contributes to the growing stream of current multilevel analysis in modern health epidemiology. However, the analytical approach of Ahern’s multilevel analysis does not apply multilevel regression (synonymous with hierarchical regression)2–5 for statistical modelling. The authors describe the association between preterm birth and neighbourhood variables by population-average regression techniques that account for intra-neighbourhood correlation using a method called generalised estimating equations (GEE).6,7 In this way the authors simply aim to provide acceptable estimates for the standard errors around point estimates (that is, odds ratios, 95% CI), treating the intra-neighbourhood correlation as a “nuisance” that needs to be adjusted in the analysis but not explicitly investigated.
Analogously to the study of Ahern, other social epidemiologists have adopted a similar analytical approach, applying SUDAAN statistical software (http://www.rti.org/) to perform multilevel analysis. As in Ahern’s study, SUDAAN analyses also consider the variance structure of the data as a necessary “nuisance”. These authors’ analytical approach and the estimation of the association between neighbourhood characteristics and health are, however, appropriate and formally correct.
Is this the end of the story? Is the only reason for applying complicated statistical techniques so that correct confidence intervals may be obtained? Is …
Footnotes
-
↵* The most fundamental reason for applying special statistical techniques in multilevel analysis is the existence of intraclass (intra-neighbourhood) correlation. The intraclass correlation is a measure of the degree of similarity among the outcomes of members of the same neighbourhood. Individuals living in the same neighbourhood may be more similar to each other than individuals living in other neighbourhoods, as they share a number of economic, social, and other neighbourhood characteristics that may condition similar health status. In this sense neighbourhoods can be consider as “clusters” of individuals sharing a common propensity for similar outcome within clusters. More technically, the intraclass correlation is a variance partition coefficient that indicates the proportion of the total variance (V)—that is, the sum of 1st level (individual) and 2nd level (neighbourhood) variances in a health outcome that it is accounted for by the 2nd level variance.9 Intraclass correlation needs to be accounted for in regression analysis, as in the study of Jennifer Ahern et al.1 Otherwise the lack of independence, arising from two sources of variation at different levels (individual and neighbourhood) of the data hierarchy contradicts the assumption for performing traditional regression analysis. If ICC is not considered, the study sample is artificially “inflated” and the standard error of neighbourhood variables underestimated. One can image 100 neighbourhoods with some 50 individuals each (that is, population size=5000 individuals). If the individuals within each neighbourhood are exactly similar to each other, but completely different from the individuals in the other neighbourhood (intra-neighbourhood correlation=100%), the effective number of individuals would be 100 rather than 5000. In other words, if the ICC=100%, the effective population size will be the number of neighbourhoods, rather than the number of individuals.
\batchmode \documentclass[fleqn,10pt,legalpaper]{article} \usepackage{amssymb} \usepackage{amsfonts} \usepackage{amsmath} \pagestyle{empty} \begin{document} \[ICC\ =\ \frac{V_{2nd\ level}}{V_{2nd\ level}\ +\ V_{1st\ level}}\] \end{document}
When studying individuals nested within neighbourhoods, an ICC=0% suggests that the areas are not important determinant of individual health, as the neighbourhoods resemble random samples from the whole population.