Article Text


The ecological fallacy strikes back
  1. Wellington School of Medicine, PO Box 7343, Wellington, New Zealand

    Statistics from

    When I first started studying epidemiology, ecological studies were briefly discussed as an inexpensive but unreliable method for studying individual level risk factors for disease. For example, rather than go to the time and expense to establish a cohort study or case-control study of fat intake and breast cancer, you could simply use national dietary and cancer incidence data and, with minimal time and expense, show a strong correlation internationally between fat intake and breast cancer. This approach was quite rightly regarded as inadequate and unreliable because of the many additional forms of bias that can occur in such studies compared with studies of individuals within a population. In particular, the “ecological fallacy” can occur in that factors that are associated with national disease rates may not be associated with disease in individuals.1 For example, almost any disease that is associated with affluence and Westernisation has in the past been associated at the national level with sales of television sets, and nowadays is probably associated at the national level with rates of internet use.

    Thus, ecological studies were not a good thing to do, and were a relic of the “pre-modern” phase of epidemiology before it became firmly established with a methodologic paradigm based on the theory of randomised controlled trials of individuals. This paradigm, which is very powerful when used appropriately, gave rise to increasingly sophisticated methods of study design and data analysis. In particular, biostatistical methods that were developed for randomised trials involving a single individual level exposure were used to reformulate and make more rigorous the previously ad hoc epidemiological methods of study design and data analysis.2 3 Thus, epidemiology courses have increasingly become restricted to discussing cohort and case-control studies and the methods of data analysis that fit the clinical trial paradigm on which they are based. There is usually little or no discussion of the philosophy of science (with the exception of some very simplistic Popperian versions), or of how theories and hypotheses are developed. Epidemiology students then graduate and go out into the “real world” to test hypotheses that can be investigated using these methods and for which funding can be obtained.

    Now population level studies are back in business, for two important reasons.

    Firstly, it is increasingly recognised that, even when studying individual level risk factors, population level studies play an essential part in defining the most important public health problems to be tackled, and in generating hypotheses as to their potential causes. Many important individual level risk factors for disease simply do not vary enough within populations to enable their effects to be identified or studied.4 More importantly, such studies are a key component of the continual cycle of theory and hypothesis generation and testing.5 Historically, the key area in which epidemiologists have been able to “add value” has been through this population focus, although this lesson has been forgotten by many modern epidemiologists. For example, many of the recent discoveries on the causes of cancer (including dietary factors and colon cancer, hepatitis B and liver cancer, aflatoxins and liver cancer, human papilloma virus and cervical cancer) have their origins, directly or indirectly, in the systematic international comparisons of cancer incidence conducted in the 1950s and 1960s.6 These suggested hypotheses concerning the possible causes of the international patterns, which were investigated in more depth in further studies. In some instances these hypotheses were consistent with biological knowledge at the time, but in other instances they were new and striking, and might not have been proposed, or investigated further, if the population level analyses had not been done. More recently, a huge amount of funding has been spent on studying the “known” causes of asthma in affluent countries (for example, air pollution, allergen exposure), and it is only now that standardised studies are revealing major international differences in asthma prevalence that are not explained by these “established” risk factors such as air pollution,7 but are more consistent with recent theories on the protective role of some infant infections in the aetiology of asthma.8

    A second reason that ecological studies are back is that it is increasingly being recognised that some risk factors for disease genuinely operate at the population level.9-11 In some instances they may directly cause disease, but perhaps more commonly they may cause disease as effect modifiers or determinants of exposure to individual level risk factors.12 For example, being poor in a rich country or neighbourhood may be worse than having the same income level in a poor country or neighbourhood, because of problems of social exclusion and lack of access to services and resources.13 This may operate through relatively direct mechanisms, but may also involve aspects of individual lifestyle that are in part determined by the social context. For example, the decision to continue to gain temporary relief and pleasure through smoking tobacco may be quite rational for someone who is surviving from week to week in difficult circumstances.

    The failure to take account of the importance of population context, as an effect modifier and determinant of individual level exposures could be termed the “individualistic fallacy”14 in which the major population determinants of health are ignored and undue attention is focused on individual characteristics. In this situation, the associations between these individual characteristics and health can be validly estimated, but their importance relative to other potential interventions, and the importance of the context of such interventions, may be ignored. For example, in most countries in the world, any individual level study will identify certain individual characteristics (including genetic factors) that appear to be the most important determinants of health. However, recent events in Eastern Europe have shown that these individual characteristics operate within a powerful population context that may be a much stronger determinant of disease at the population level.15 Ignoring this context and attempting to study homogeneous populations can lead to the erroneous conclusion that individual characteristics are the main determinants of disease and the most important for intervention, just as studying populations with homogeneous lifestyles can lead to the erroneous conclusion that other factors are the main determinants of disease.4

    These considerations have lead to a revival of population level studies in recent years, with an increasing interest in statistical methods of multi-level analysis. These have considerable merits as they permit the estimation of population level (ecological) effects while also including individual level effects,16 thus avoiding both the ecological fallacy and the individualistic fallacy. However, although there has been much discussion of the statistical analysis of such studies, there has been relatively little discussion of the other methodological issues involved in studying genuine ecological effects. The paper by Blakely and Woodward in this issue of the journal is therefore a very timely and valuable contribution. In particular, they note their concern that “the application of multi-level statistical methods may have surged ahead of a theoretical framework in which to conduct meaningful and robust analyses” and that “as researchers move beyond the initial exhilaration of applying the ‘magic’ of multi-level statistical methods to data, there will be an increasing and necessary focus on theory, study design, and sources of error”. Just as learning to use the Mantel-Haenszel method or standard logistic regression is only a small part of learning to be an epidemiologist, learning to do multi-level logistic regression is just a small part of learning to be a multi-level epidemiologist. In both instances, the biostatistical methods are merely one part of the epidemiological toolkit, which includes methods of appropriate study design, including avoidance, minimisation or assessment of possible biases. More importantly, in both instances, a knowledge of appropriate methods of study design and data analysis is not a substitute for knowing how to choose the most appropriate hypothesis to study.

    So how can epidemiologists learn to think in a multi-level way? How can they ensure that the best hypotheses are developed for study, and that the “appropriate technology” (whether individual or population level) is then used to test them? How can epidemiology students learn such methods in a such a manner that they can use them appropriately, rather than letting the methods they learn define and restrict the questions they subsequently ask? There are two principles from clinical teaching and practice that may be particularly relevant in this regard.

    Firstly, a problem-based approach to teaching clinical medicine has been increasingly adopted in medical schools around the world. The value of this approach is that theories and methods are taught in the context of solving real life problems. This places the methods into context, and helps ensure that the appropriate methods are chosen to fit the problem, rather than making the problem fit the methods. Perhaps the teaching of problem-based epidemiology can help to restore the link to public health, and to the real world in which most public health problems involve a variety of levels of disease causation. Studying real public health problems in their historical and social context does not exclude learning about sophisticated methods of study design and data analysis (in fact, it necessitates it), but it may help to ensure that the appropriate questions are asked and the “appropriate technology” is then used to answer them.

    Secondly, the decision as to what is “appropriate technology” should be based on evidence. This is less obvious than it seems, as many epidemiological methods are not evidence-based. For example, the current wave of enthusiasm for “molecular epidemiology” has led to the widespread use of biomarkers of exposure even when there is very little evidence of their validity. The need for an evidence-based epidemiology also applies to the general “research strategy” that is used by epidemiologists, as well as the specific research methods that are used, as there is good historical evidence of the value of a population-based approach.5

    In some instances the use of these new methods will make epidemiology more complicated. This is noted somewhat disparagingly by Poole and Rothman17 who seem to equate critics of “modern epidemiology” with those who would prefer a return to the “simpler” more ad hoc methods of the past. However, the issue here is not that the use of sophisticated statistical methods is desirable or undesirable in itself. Rather, the issue is that we should answer the most important scientific and public health questions and should use “appropriate technology” to answer them. In some instances, the population approach will produce hypotheses that can be investigated with straightforward cohort or case-control studies and analysed using simple 2 × 2 tables, or the corresponding multivariate methods of Poisson or logistic regression. In other instances, quite different methods of study design and data analysis may be required.11

    In each instance, epidemiology will continue to involve a healthy collaboration between epidemiologists and biostatisticians (as well as biologists, social scientists and others), but it is epidemiologists who have the primary responsibility to identify and develop the most important population level research questions, which can then be investigated using appropriate biostatistical methods. The paper by Blakely and Woodward is an important contribution in this regard, as it alerts us to the dangers of simplying adding multi-level modelling to our analytical toolkit, and raises the important issues of theory development, study design and assessment of bias that must be considered in multi-level studies, just as they currently are (or should be) considered in individual level studies.


    I wish to thank Ichiro Kawachi and Tony McMichael for their comments on the draft manuscript.

    Funding: Professor Pearce is funded by a Programme Grant from the Health Research Council of New Zealand.

    View Abstract

    Request permissions

    If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.