Article Text


Discovering environmental causes of disease
  1. Stephen M Rappaport
  1. Correspondence to Professor Stephen M Rappaport, School of Public Health, University of California, Berkeley, CA 94720-7356, USA; srappaport{at}


Although chronic diseases are primarily environmental (ie, not genetic) in origin, the particular environmental causes of these diseases are poorly understood. A WHO study of worldwide cancer mortality identified nine diverse environmental factors, including pollution, diet, lifestyle factors and infections. However, the joint effect of these nine factors accounted for only about one-third of cancer mortality, indicating that about two-thirds are of unknown aetiology. One problem relates to the community of epidemiologists, which sorts environmental factors into marginally overlapping domains, thereby creating gaps in coverage. Also, information about environmental exposures in epidemiologic studies is generally derived from questionnaires that are ill suited for assessing thousands of potentially causative exposures. Finally, the few studies that rigorously estimate exposure levels focus upon a handful of pollutants of regulatory importance and thus are unsuited for finding hitherto unrecognised exposures from both exogenous and endogenous sources. The concept of the ‘exposome’—representing the totality of exposures from gestation onwards—has recently been introduced as a complement to the genome in studies of disease aetiology. The exposome concept promotes environmental analogues of genome-wide association studies, which employ untargeted omic methods to compare biospecimens from diseased and healthy subjects. The goal of such investigations is to discover key biomarkers of exposure that enable follow-up hypotheses to be explored regarding sources of exposure, dose–response relationships, mechanisms of action, disease causality and public health interventions. Examples of this approach are cited from recent metabolomic studies of several complex chronic diseases.

  • exposome
  • environment
  • exposure
  • disease aetiology
  • metabolomics
  • environmental health
  • measurement
  • occupational health

Statistics from

Chronic diseases are primarily environmental in origin

Worldwide mortality increasingly favours chronic diseases, particularly cancer and vascular disease.1 2 Indeed, these two diseases account for more than half of all deaths in the USA3 and for increasing mortality in developing countries.4 A reasonable first step towards understanding the aetiologies of cancer and vascular disease is to compare disease proportions caused by genetic and environmental (ie, non-genetic) factors. These proportions can be estimated by the respective population attributable fractions (PAFs)5 which, despite their shortcomings,6 7 provide guidance in differentiating across risk factors.

Since families share many genes, it is logical to estimate PAFs for genetic factors via disease concordance among family members. Using data from the Swedish Family-Cancer Database of 10.2 million individuals, Hemminki et al8 estimated PAFs for genetic factors ranging from about 1%–3% for most tumours to 10%–20% for breast and prostate cancer. Regarding vascular disease, Marenberg et al9 reported that, after adjustment for risk factors, HRs for heart disease mortality declined with age among 21 004 pairs of Swedish twins. Using the results from Marenberg et al,9 I estimated PAFs for shared genes between 32% and 61% for four categories of twins classified by gender and zygosity and summed over all ages (monozygotic: males=61%, females=59%; dizygotic: males=40%, females=32%). Thus, familial and twin studies suggest that roughly 90% of cancer deaths and half of heart disease mortality cannot be explained by the genes and, therefore, point to environmental factors. The notion that most chronic diseases are not genetic in origin finds support from more than 400 genome-wide association studies (GWAS), which have collectively explained relatively little of the variability in chronic disease prevalence.10

Environmental exposures are poorly characterised

If environmental factors are important contributors to chronic diseases, then what are the causative exposures? Unfortunately, we only have a sketchy understanding of the key exposures for reasons that will be discussed. But to illustrate the current state of knowledge, table 1 shows PAFs for cancer mortality attributed to nine environmental risk factors surveyed by WHO.1 These factors—which include pollution (urban air and indoor smoke), diet (low fruit and vegetable consumption), lifestyle (smoking and alcohol consumption, overweight and obesity, physical inactivity) and infections (unsafe sex and contaminated injections)—point to the remarkable diversity of environmental factors and their differential effects on populations from low/middle-income and high-income countries. The largest PAFs are for smoking (21% overall, 18% for low/middle-income countries and 29% for high-income countries), while the lowest PAFs are for indoor smoke from biomass (0.2% overall, 0.3% for low/middle-income countries and 0.0% for high-income countries). The joint effects of all nine risk factors shown in table 1 were 34% and 37% of cancer mortality, respectively, in the low/middle-income and high-income countries1; these are surprisingly large proportions considering the crude exposure surrogates and strengthen the notion that environmental exposures are important contributors to cancer. Nonetheless, about two-thirds of the cancers remain unexplained by all environmental factors included in table 1.

Table 1

Population attributable fractions (PAFs) for cancer mortality (all tumour types combined) attributed to individual environmental factors in low/middle-income and high-income countries (from Danaei et al1)

In an age when GWAS are increasingly conducted to test for disease associations with hundreds of thousands of polymorphic genes, it is difficult to reconcile the rather crude state of knowledge about environmental exposures (table 1). Part of the problem arguably involves parochial boundaries across the community of epidemiologists.11 Certainly, genetic epidemiologists regard the environment as everything except the genes.12 However, the other branches of epidemiology—all of which are devoted to environmental factors—offer fragmented views of the exposures. Environmental epidemiologists focus primarily upon pollutants in air and, to a lesser extent, in water, soil and food; nutritional epidemiologists examine the diet and exercise; infectious disease epidemiologists consider viruses and bacteria and social factor epidemiologists investigate societal, lifestyle and behavioural aspects of human life. This sorting of environmental factors into marginally overlapping domains creates gaps in coverage and leaves one wondering whether undiscovered causes of chronic diseases are receiving the attention they deserve. For example, the microbiome—consisting of the entourage of 1014 bacteria resident in and on the human body and harbouring a metagenome of roughly 1 million genes—has largely escaped the notice of epidemiologists. Yet, the microbiome is important to the development and maintenance of the human immune system; it has been associated with obesity, atherosclerosis and inflammatory diseases; it produces thousands of small molecules (some toxic) that are shared with the human host and it varies across human populations.13–15 Likewise, psychosocial stress—that alters hormonal balance, contributes to oxidative damage, accelerates cell ageing and telomere shortening and increases the risk of coronary heart disease16 17—is rarely included in the inventory of environmental risk factors catalogued in epidemiologic studies.

In stark contrast to the state-of-the-art methods used in GWAS, the arsenal of tools for characterising environmental exposures has changed little in the last half century. Indeed, virtually all such information has been gleaned from questionnaires18 that, despite their utility in establishing certain aspects of subjects' histories, are ill suited for quantitative assessment of thousands of potentially causative exposures. Furthermore, the few studies that rigorously estimate exposure levels focus largely upon a handful of chemicals included in either the US Environmental Protection Agency's list of 129 priority pollutants in air, water, etc ( or the approximately 300 environmental chemicals measured in human biological fluids by the US Centers for Disease Control and Prevention.19 Such targeted studies cannot find hitherto unrecognised exposures and also neglect the changing nature and complex interactions inherent in environmental and biological systems.

The exposome concept

Recognising that environmental exposures are poorly characterised in studies of disease aetiology, Wild20 introduced the concept of the ‘exposome’, representing the totality of exposures from gestation onwards, as a complement to the genome. Rappaport and Smith11 refined the exposome concept to explicitly recognise both exogenous and endogenous sources of exposure, as illustrated in figure 1. In addition to pollutants and radiation, the exposome includes the diet, behaviour and lifestyle (mainly smoking, alcohol, overweight and physical exercise), infections, pre-existing disease (including diabetes, high blood pressure and dyslipidaemia), psychological stress, and a host of potentially influential endogenous factors, notably inflammation, lipid peroxidation, oxidative stress and the microbiome.

Figure 1

The individual exposome. A person's exposome is composed of environmental factors derived from both exogenous and endogenous sources. Long-term exposures to causative features of the exposome can lead to chronic diseases, notably cancer and vascular diseases (modified from Rappaport21).

Because the exposome includes all exposures experienced by an individual, it promotes environmental analogues of GWAS, which employ untargeted omic methods to discover key exposures.11 15 22 But untargeted designs are complicated because the exposome includes components originating both inside and outside the body (figure 1) and exposure levels can vary within individuals over time and across populations (due to changes in external and internal sources, age, exercise, infections, lifestyle, stress and pre-existing disease). Confronting such diverse factors, I have argued for a ‘top–down’ approach to discover causative exposures, which employs biological fluids or tissues from populations with and without disease.11 21 Recent applications involving biospecimens from diseased and healthy subjects have found associations between metabolomic profiles and several chronic diseases, including coronary heart disease,23 24 prostate cancer,25 colorectal cancer,26 type 1 diabetes27 and mitochondrial disease.28 If longitudinal biospecimens are available, then issues can be addressed regarding sources of exposure variability, causality and disease progression. For example, Oresic et al27 used longitudinal blood samples to detect characteristic lipidome signatures in diabetic children before and after seroconversion for autoimmunity antibodies.

Because it covers both exogenous and endogenous exposures illustrated in figure 1, I regard the top–down strategy to be more intuitive and efficient for characterising exposures in studies of disease aetiology than the related ‘bottom–up’ approach that would address the many sources of external exposure.11 21 However, there may be situations where an external medium should receive priority, as with airborne exposures in investigations of the aetiologies of respiratory diseases. Nonetheless, untargeted designs would still be favoured.

From discovery to public health

The exposome concept expands our view of the environment to include all non-genetic factors experienced by individuals throughout life. Yet, the exposome only opens the door to disease aetiology by pointing to hitherto unknown associations that can guide subsequent targeted studies. Having found a set of omic features that is differentiated between relatively small numbers of subjects with and without disease, a host of hypotheses would be generated to elucidate structural identities, to develop biomarkers for high-throughput screens, to confirm a priori associations and dose–response relationships in large populations, to characterise sources of exposures from exogenous and/or endogenous processes, to establish mechanisms of action and disease causality and to promote public health interventions.

This process by which the exposome concept can initiate discovery of causal environmental exposures is beautifully illustrated by the recent investigation of vascular disease by Hazen and colleagues.24 That study began with untargeted serum metabolomics in biobanked specimens from two small cohorts totalling 75 vascular disease patients and 75 age- and gender-matched controls. Of the approximately 2000 small molecules detected in untargeted analysis, 18 were significantly associated with vascular disease. Follow-up studies identified three highly correlated analytes as choline and its metabolites, betaine and trimethylamine oxide (TMAO), which were subsequently screened in 1870 subjects from an independent cohort of atherosclerosis patients and controls. ORs were highly significant for all three biomarkers, with TMAO showing the greatest potency. These results—plus extensive animal experiments—led the authors to postulate that the gut flora metabolise choline to trimethylamine, which is subsequently absorbed into the human host and metabolised to TMAO, the alleged pro-atherosclerotic principle. If TMAO and/or choline metabolism by the gut flora are ultimately confirmed as causes of atherosclerosis, then public health measures can be explored to reduce exposures through a combination of dietary, probiotic and pharmacologic interventions. And we should not forget that Hazen's initial untargeted metabolomic analyses pointed to an additional 15 small molecules—the identities of which have not yet been reported—that were significantly associated with vascular disease (ORs ranged from 4.5 to 27.9).

As noted earlier, exposures that are quantitatively evaluated in environmental epidemiology today rarely extend beyond either the 129 priority pollutants in external media or the approximately 300 environmental chemicals measured in human biological fluids. This universe of scrutinised chemicals is tiny when compared with the human metabolome—currently defined by 7900 small molecules in human serum (—which has already been used to investigate the aetiologies of several complex diseases.23–28 Although the metabolome is well suited for investigations of exposures originating from the diet and the gut flora,15 22 it does not include all exposures. Thus, other omic methods will also be needed.11 21 Readers of this journal should consider being among the cadre of investigators who will apply omic methods to discover environmental causes of disease.


The author appreciates helpful discussions with Michael Bates, who read an early draft of this commentary, and Martyn Smith. The author acknowledges research support from the National Institute for Environmental Health Sciences.


View Abstract


  • Addendum: In an effort to introduce the exposome concept to the environmental health sciences, the US National Institute for Environmental Health Sciences requested that the National Academy of Sciences convene workshops to discuss implications of the exposome for understanding the causes of human diseases (25–26 February 2010) and methods for characterising individual exposomes (8–9 December 2011). Information is available at about both workshops.

  • Funding This study was supported by grant U54ES016115 from the National Institute for Environmental Health Sciences.

  • Competing interests None.

  • Provenance and peer review Commissioned; internally peer reviewed.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.