Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Gene–environment-wide association studies: emerging approaches

Key Points

  • Studies of gene–environment (G×E) interactions can be useful for investigating biological pathways, and can reveal genes that act only in particular environments or exposures that are hazardous only to genetically susceptible individuals. Such knowledge can be used for setting environmental safety standards, understanding heterogeneity in genetic associations across populations, predicting the risks and changes to an individual that might result from changes in modifiable risk factors, and choosing the best treatment based on a patient's genotype.

  • Basic epidemiological cohort or case–control designs can be used for studying G×E interactions, but more powerful alternatives include case-only, two-phase case–control and counter-matched designs. Case-only substudies within clinical trials are attractive for studying genetic modifiers of treatment response because genotype and treatment can be assumed to be independent through randomization.

  • Various exploratory and hypothesis-driven approaches are available for examining the joint effects of multiple genes and exposures in a common pathway. Hierarchical models provide a way to incorporate external knowledge about the pathway into the analysis of complex interactions in the study data.

  • Two-step analyses can be used in genome-wide association studies to target a subset of promising interactions and improve the power for testing them in the same data set using an independent test. New methods are being developed that use pathway information to guide the search for novel genes and interactions or that mine agnostic genome scans for novel pathways.

  • Comprehensive ontologies that incorporate environmental and toxicological information into genomic and pathway databases will be useful for informing future analyses of complex G×E interactions in both pathway-driven and genome-wide association scans.

  • Emerging areas include understanding how the environment influences gene expression through epigenetics, somatic mutations and other mechanisms, and understanding the roles of these effects in disease causation. Various types of biomarkers and high-volume metabolomics methods can be incorporated as intermediate variables in pathway-based analysis methods.

Abstract

Despite the yield of recent genome-wide association (GWA) studies, the identified variants explain only a small proportion of the heritability of most complex diseases. This unexplained heritability could be partly due to gene–environment (G×E) interactions or more complex pathways involving multiple genes and exposures. This Review provides a tutorial on the available epidemiological designs and statistical analysis approaches for studying specific G×E interactions and choosing the most appropriate methods. I discuss the approaches that are being developed for studying entire pathways and available techniques for mining interactions in GWA data. I also explore methods for marrying hypothesis-driven pathway-based approaches with 'agnostic' GWA studies.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Schematic representation of the two-step gene–environment-wide interaction test.

Similar content being viewed by others

References

  1. Le Marchand, L. The predominance of the environment over genes in cancer causation: implications for genetic epidemiology. Cancer Epidemiol. Biomarkers Prev. 14, 1037–1039 (2005).

    Article  PubMed  Google Scholar 

  2. Le Marchand, L. & Wilkens, L. R. Design considerations for genomic association studies: importance of gene–environment interactions. Cancer Epidemiol. Biomarkers Prev. 17, 263–267 (2008).

    Article  CAS  PubMed  Google Scholar 

  3. Kraft, P., Yen, Y. C., Stram, D. O., Morrison, J. & Gauderman, W. J. Exploiting gene–environment interaction to detect genetic associations. Hum. Hered. 63, 111–119 (2007).

    Article  CAS  PubMed  Google Scholar 

  4. Hunter, D. J. Gene–environment interactions in human diseases. Nature Rev. Genet. 6, 287–298 (2005). An excellent Review of the basic principles of epidemiological study designs for G×E interactions in the pre-GWA studies era. Among other insights, the author argues that G×E findings can 'point the finger' towards the causal constituent of a complex mixture.

    Article  CAS  PubMed  Google Scholar 

  5. Greene, C. S., Penrod, N. M., Williams, S. M. & Moore, J. H. Failure to replicate a genetic association may provide important clues about genetic architecture. PLoS ONE 4, e5639 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  6. Ioannidis, J. P. Non-replication and inconsistency in the genome-wide association setting. Hum. Hered. 64, 203–213 (2007).

    Article  CAS  PubMed  Google Scholar 

  7. Thomas, D. Methods for investigating gene–environment interactions in candidate pathway and genome-wide association studies. Annu. Rev. Public Health4 Jan 2010 (doi:10.1146/annurev.publhealth.012809.103619).

    Article  PubMed  PubMed Central  Google Scholar 

  8. Cordell, H. J. Detecting gene–gene interactions that underlie human diseases. Nature Rev. Genet. 10, 392–404 (2009).

    Article  CAS  PubMed  Google Scholar 

  9. Holmans, P. et al. Gene Ontology analysis of GWA study data sets provides insights into the biology of bipolar disorder. Am. J. Hum. Genet. 85, 13–24 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Sebastiani, P., Ramoni, M. F., Nolan, V., Baldwin, C. T. & Steinberg, M. H. Genetic dissection and prognostic modeling of overt stroke in sickle cell anemia. Nature Genet. 37, 435–440 (2005).

    Article  CAS  PubMed  Google Scholar 

  11. Khoury, M. J. & Wacholder, S. Invited commentary: from genome-wide association studies to gene–environment-wide interaction studies — challenges and opportunities. Am. J. Epidemiol. 169, 227–230 (2009).

    Article  PubMed  Google Scholar 

  12. Thomas, D. C. Exposure–time–response relationships with applications to cancer epidemiology. Ann. Rev. Public Health 9, 451–482 (1988).

    Article  CAS  Google Scholar 

  13. Thomas, D. C., Stram, D. & Dwyer, J. Exposure measurement error: influence on exposure–disease relationships and methods of correction. Ann. Rev. Public Health 14, 69–93 (1993).

    Article  CAS  Google Scholar 

  14. Lobach, I., Carroll, R. J., Spinka, C., Gail, M. H. & Chatterjee, N. Haplotype-based regression analysis and inference of case–control studies with unphased genotypes and measurement errors in environmental exposures. Biometrics 64, 673–684 (2008).

    Article  PubMed  Google Scholar 

  15. Wong, M. Y., Day, N. E., Luan, J. A. & Wareham, N. J. Estimation of magnitude in gene–environment interactions in the presence of measurement error. Stat. Med. 23, 987–998 (2004).

    Article  CAS  PubMed  Google Scholar 

  16. Smith, P. G. & Day, N. E. The design of case–control studies: the influence of confounding and interaction effects. Int. J. Epidemiol. 13, 356–365 (1984).

    Article  CAS  PubMed  Google Scholar 

  17. Gauderman, W. J. Sample size requirements for matched case–control studies of gene–environment interaction. Stat. Med. 21, 35–50 (2002). This paper describes a general approach to sample size and power calculations for G×E studies and the capabilities of the freely available Quanto program for this purpose.

    Article  PubMed  Google Scholar 

  18. Garcia-Closas, M. & Lubin, J. H. Power and sample size calculations in case–control studies of gene–environment interactions: comments on different approaches. Am. J. Epidemiol. 149, 689–692 (1999).

    Article  CAS  PubMed  Google Scholar 

  19. Burton, P. R. et al. Size matters: just how big is BIG? Quantifying realistic sample size requirements for human genome epidemiology. Int. J. Epidemiol. 38, 263–273 (2009).

    Article  PubMed  Google Scholar 

  20. Ioannidis, J. P., Trikalinos, T. A. & Khoury, M. J. Implications of small effect sizes of individual genetic variants on the design and interpretation of genetic association studies of complex diseases. Am. J. Epidemiol. 164, 609–614 (2006).

    Article  PubMed  Google Scholar 

  21. Matullo, G., Berwick, M. & Vineis, P. Gene–environment interactions: how many false positives? J. Natl Cancer Inst. 97, 550–551 (2005).

    Article  PubMed  Google Scholar 

  22. Clayton, D. & McKeigue, P. M. Epidemiological methods for studying genes and environmental factors in complex diseases. Lancet 358, 1356–1360 (2001). This paper takes a critical look at the current enthusiasm for G×E interactions, particularly in the context of large biobanks. The authors argue for case–control studies over cohort studies and for relying on case-only methods for detecting G×E interactions; however, they question whether genes involved in interactions might not more easily be discovered on the basis of the marginal associations they induce.

    Article  CAS  PubMed  Google Scholar 

  23. Moore, J. H. The ubiquitous nature of epistasis in determining susceptibility to common human diseases. Hum. Hered. 56, 73–82 (2003). The creator of the MDR algorithm for identifying higher-order interactions gives a spirited argument in support of the notion that many such effects would be overlooked by limiting attention to factors showing significant main effects.

    Article  PubMed  Google Scholar 

  24. Moore, J. H. & Williams, S. M. Epistasis and its implications for personal genetics. Am. J. Hum. Genet. 85, 309–320 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Yang, Q. & Khoury, M. J. Evolving methods in genetic epidemiology. III. Gene–environment interaction in epidemiologic research. Epidemiol. Rev. 19, 33–43 (1997). Another excellent review of study design principles for G×E interactions, covering a broad range of designs.

    Article  CAS  PubMed  Google Scholar 

  26. Manolio, T. A., Bailey-Wilson, J. E. & Collins, F. S. Genes, environment and the value of prospective cohort studies. Nature Rev. Genet. 7, 812–820 (2006).

    Article  CAS  PubMed  Google Scholar 

  27. Andrieu, N. & Goldstein, A. M. Epidemiologic and genetic approaches in the study of gene–environment interaction: an overview of available methods. Epidemiol. Rev. 20, 137–147 (1998).

    Article  CAS  PubMed  Google Scholar 

  28. Piegorsch, W., Weinberg, C. & Taylor, J. Non-hierarchical logistic models and case-only designs for assessing susceptibility in population-based case–control studies. Stat. Med. 13, 153–162 (1994). The paper that introduced the case-only design for testing G×E interactions.

    Article  CAS  PubMed  Google Scholar 

  29. Caporaso, N. et al. Genome-wide and candidate gene association study of cigarette smoking behaviors. PLoS ONE 4, e4653 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  30. Thorgeirsson, T. E. et al. A variant associated with nicotine dependence, lung cancer and peripheral arterial disease. Nature 452, 638–642 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Thomas, D. C. Case–parents design for gene–environment interaction by Schaid. Genet. Epidemiol. 19, 461–463 (2000).

    Article  CAS  PubMed  Google Scholar 

  32. Broeks, A. et al. Identification of women with an increased risk of developing radiation-induced breast cancer: a case only study. Breast Cancer Res. 9, R26 (2007).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  33. Albert, P. S., Ratnasinghe, D., Tangrea, J. & Wacholder, S. Limitations of the case-only design for identifying gene–environment interactions. Am. J. Epidemiol. 154, 687–693 (2001).

    Article  CAS  PubMed  Google Scholar 

  34. Mukherjee, B. et al. Tests for gene–environment interaction from case–control data: a novel study of type I error, power and designs. Genet. Epidemiol. 32, 615–626 (2008).

    Article  PubMed  Google Scholar 

  35. Li, D. & Conti, D. V. Detecting gene–environment interactions using a combined case-only and case–control approach. Am. J. Epidemiol. 169, 497–504 (2009).

    Article  PubMed  Google Scholar 

  36. Schaid, D. Case–parents design for gene–environment interaction. Genet. Epidemiol. 16, 261–273 (1999). This paper introduced the transmission-disequilibrium test stratified by the case's exposure as a method of testing for G×E interactions that is robust to population G–E association.

    Article  CAS  PubMed  Google Scholar 

  37. Gauderman, W. J., Witte, J. S. & Thomas, D. C. Family-based association studies. J. Natl Cancer Inst. Monogr. 26, 31–37 (1999).

    Article  Google Scholar 

  38. Laird, N. M. & Lange, C. Family-based designs in the age of large-scale gene-association studies. Nature Genet. 7, 385–394 (2006). A review of the various family-based designs for testing genetic main effects in the context of GWA studies.

    Article  CAS  Google Scholar 

  39. Cui, J. S. et al. Regressive logistic and proportional hazards disease models for within-family analyses of measured genotypes, with application to a CYP17 polymorphism and breast cancer. Genet. Epidemiol. 24, 161–172 (2003).

    Article  PubMed  Google Scholar 

  40. Boomsma, D., Busjahn, A. & Peltonen, L. Classical twin studies and beyond. Nature Rev. Genet. 3, 872–882 (2002).

    Article  CAS  PubMed  Google Scholar 

  41. Andrieu, N. & Demenais, F. Interactions between genetic and reproductive factors in breast cancer risk in a French family sample. Am. J. Hum. Genet. 61, 678–690 (1997).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Gauderman, W. J. & Faucett, C. L. Detection of gene–environment interactions in joint segregation and linkage analysis. Am. J. Hum. Genet. 61, 1189–1199 (1997).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Gauderman, W. J. & Siegmund, K. D. Gene–environment interaction and affected sib pair linkage analysis. Hum. Hered. 52, 34–46 (2001).

    Article  CAS  PubMed  Google Scholar 

  44. Schaid, D. J., Olson, J. M., Gauderman, W. J. & Elston, R. C. Regression models for linkage: issues of traits, covariates, heterogeneity, and interaction. Hum. Hered. 55, 86–96 (2003).

    Article  PubMed  Google Scholar 

  45. White, J. E. A two stage design for the study of the relationship between a rare exposure and a rare disease. Am. J. Epidemiol. 115, 119–128 (1982). The paper that first introduced the idea of two-stage sampling in the epidemiologic context.

    Article  CAS  PubMed  Google Scholar 

  46. Breslow, N. E. & Chatterjee, N. Design and analysis of two-phase studies with binary outcome applied to Wilms tumor prognosis. Appl. Stat. 48, 457–468 (1999). Arguably the most accessible summary of a major series of papers on the design and analysis of two-phase case–control studies.

    Google Scholar 

  47. Li, R. et al. Glutathione S-transferase genotype as a susceptibility factor in smoking-related coronary heart disease. Atherosclerosis 149, 451–462 (2000).

    Article  CAS  PubMed  Google Scholar 

  48. Breslow, N. E., Lumley, T., Ballantyne, C. M., Chambless, L. E. & Kulich, M. Using the whole cohort in the analysis of case–cohort data. Am. J. Epidemiol. 169, 1398–1405 (2009). An important contribution to the literature on two-phase case–control studies that emphasizes the value added by exploiting the information available on the entire cohort that is not used in standard analysis methods.

    Article  PubMed  PubMed Central  Google Scholar 

  49. Bernstein, J. L. et al. Study design: evaluating gene–environment interactions in the etiology of breast cancer — the WECARE study. Breast Cancer Res. 6, R199–R214 (2004). This paper provides an overview of the design of the WECARE study, giving particular attention to the power gained from using the counter-matched design when testing for gene–radiation interactions.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Langholz, B. & Goldstein, L. Risk set sampling in epidemiologic cohort studies. Stat. Sci. 11, 35–53 (1996). This paper provides a non-technical discussion of counter-matching and other cohort sampling designs, with numerous examples of applications for epidemiologic studies.

    Google Scholar 

  51. Andrieu, N., Goldstein, A. M., Thomas, D. C. & Langholz, B. Counter-matching in studies of gene–environment interaction: efficiency and feasibility. Am. J. Epidemiol. 153, 265–274 (2001).

    Article  CAS  PubMed  Google Scholar 

  52. Gilliland, F. D., McConnell, R., Peters, J. & Gong, H. Jr. A theoretical basis for investigating ambient air pollution and children's respiratory health. Environ. Health Perspect. 107, 403–407 (1999). This paper provides a superb overview of the biological rationale for focusing studies of air pollution and respiratory disease on genes and environmental modifiers involved in oxidative stress and inflammatory pathways.

    Article  PubMed  PubMed Central  Google Scholar 

  53. Hoh, J., Wille, A. & Ott, J. Trimming, weighting, and grouping SNPs in human case–control association studies. Genome Res. 11, 2115–2119 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. McKinney, B. A., Reif, D. M., Ritchie, M. D. & Moore, J. H. Machine learning for detecting gene–gene interactions: a review. Appl. Bioinformatics 5, 77–88 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Moore, J. H. & Williams, S. M. Epistasis and its implications for personal genetics. Am. J. Hum. Genet. 85, 309–320 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Ritchie, M. D. & Motsinger, A. A. Multifactor dimensionality reduction for detecting gene–gene and gene–environment interactions in pharmacogenomics studies. Pharmacogenomics 6, 823–834 (2005).

    Article  CAS  PubMed  Google Scholar 

  57. Le Marchand, L. et al. Combined effects of well-done red meat, smoking, and rapid N-acetyltransferase 2 and CYP1A2 phenotypes in increasing colorectal cancer risk. Cancer Epidemiol. Biomarkers Prev. 10, 1259–1266 (2001). A classic example of an interaction involving two genes and two exposures for which none of the constituent lower-order main effects or interactions is significant.

    CAS  PubMed  Google Scholar 

  58. Vineis, P. et al. Current smoking, occupation, N-acetyltransferase-2 and bladder cancer: a pooled analysis of genotype-based studies. Cancer Epidemiol. Biomarkers Prev. 10, 1249–1252 (2001).

    CAS  PubMed  Google Scholar 

  59. Thomas, D. C. et al. Approaches to complex pathways in molecular epidemiology: summary of an AACR special conference. Cancer Res. 68, 10028–10030 (2008).

    Article  CAS  PubMed  Google Scholar 

  60. Thomas, D. C. The need for a systematic approach to complex pathways in molecular epidemiology. Cancer Epidemiol. Biomarkers Prev. 14, 557–559 (2005).

    Article  PubMed  Google Scholar 

  61. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Wang, K., Li, M. & Bucan, M. Pathway-based approaches for analysis of genomewide association studies. Am. J. Hum. Genet. 81, 1278–1283 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Hong, M. G., Pawitan, Y., Magnusson, P. K. & Prince, J. A. Strategies and issues in the detection of pathway enrichment in genome-wide association studies. Hum. Genet. 126, 289–301 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Chasman, D. I. On the utility of gene set methods in genomewide association studies of quantitative traits. Genet. Epidemiol. 32, 658–668 (2008). This paper provides a clear discussion of the use of GSEA as a way of prioritizing hits from a GWA study and interpreting the ensemble of SNP associations in relation to pathways.

    Article  PubMed  Google Scholar 

  65. Aragaki, C. C., Greenland, S., Probst-Hensch, N. & Haile, R. W. Hierarchical modeling of gene–environment interactions: estimating NAT2 genotype-specific dietary effects on adenomatous polyps. Cancer Epidemiol. Biomarkers Prev. 6, 307–314 (1997).

    CAS  PubMed  Google Scholar 

  66. Wakefield, J., De Vocht, F. & Hung, R. J. Bayesian mixture modeling of gene–environment and gene–gene interactions. Genet. Epidemiol. 34, 16–25 (2010).

    PubMed  PubMed Central  Google Scholar 

  67. Hung, R. J. et al. Inherited predisposition of lung cancer: a hierarchical modeling approach to DNA repair and cell cycle control pathways. Cancer Epidemiol. Biomarkers Prev. 16, 2736–2744 (2007).

    Article  CAS  PubMed  Google Scholar 

  68. Hung, R. J. et al. Using hierarchical modeling in genetic association studies with multiple markers: application to a case–control study of bladder cancer. Cancer Epidemiol. Biomarkers Prev. 13, 1013–1021 (2004). One of the first examples of the use of hierarchical modelling for the study of G×E interactions. A set of pathway indicator variables are used as prior covariates to classify specific combinations of genes and environmental exposures.

    CAS  PubMed  Google Scholar 

  69. Conti, D. V. et al. in Phenotypes and Endophenotypes: Foundations for Genetic Studies of Nicotine Use and Dependence (ed. Swan, G. E.) 539–584 (NCI Tobacco Control Monographs, Bethesda, Maryland, 2009).

    Google Scholar 

  70. Wang, L. & Weinshilboum, R. M. Pharmacogenomics: candidate gene identification, functional validation and mechanisms. Hum. Mol. Genet. 17, R174–R179 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Rebbeck, T. R., Spitz, M. & Wu, X. Assessing the function of genetic variants in candidate gene association studies. Nature Rev. Genet. 5, 589–597 (2004). An excellent discussion of ways of interpreting candidate-gene associations in relation to biological function. The functions are inferred from various external sources of information or from programs for computing the predicted function of polymorphisms.

    Article  CAS  PubMed  Google Scholar 

  72. Ulrich, C. M. et al. Mathematical modeling of folate metabolism: predicted effects of genetic polymorphisms on mechanisms and biomarkers relevant to carcinogenesis. Cancer Epidemiol. Biomarkers Prev. 17, 1822–1831 (2008). One of a long series of papers on mathematical modelling of the folate pathway. This article focuses specifically on the use of the authors' model to predict the effects of variation in metabolic rate parameters for polymorphisms in specific genes on various outcomes, such as homocysteine concentration or DNA methylation reactions.

    Google Scholar 

  73. Thomas, D. C. et al. Use of pathway information in molecular epidemiology. Hum. Genomics 4, 21–42 (2010).

    Article  CAS  Google Scholar 

  74. Armitage, P. & Doll, R. The age distribution of cancer and a multistage theory of carcinogenesis. Br. J. Cancer 8, 1–12 (1954).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Moolgavkar, S. H. & Knudson, A. G. Jr. Mutation and cancer: a model for human carcinogenesis. J. Natl Cancer Inst. 66, 1037–1052 (1981).

    Article  CAS  PubMed  Google Scholar 

  76. Racine-Poon, A. & Wakefield, J. Statistical methods for population pharmacokinetic modelling. Stat. Methods Med. Res. 7, 63–84 (1998).

    Article  CAS  PubMed  Google Scholar 

  77. Clewell, H. J., Andersen, M. E. & Barton, H. A. A consistent approach for the application of pharmacokinetic modeling in cancer and noncancer risk assessment. Environ. Health Persp. 110, 85–93 (2002).

    Article  Google Scholar 

  78. Bois, F. Y. Applications of population approaches in toxicology. Toxicol. Lett. 120, 385–394 (2001).

    Article  CAS  PubMed  Google Scholar 

  79. Nijhout, H. F., Reed, M. C. & Ulrich, C. M. Mathematical models of folate-mediated one-carbon metabolism. Vitam. Horm. 79, 45–82 (2008).

    Article  CAS  PubMed  Google Scholar 

  80. Bergman, R. N. et al. Minimal model-based insulin sensitivity has greater heritability and a different genetic basis than homeostasis model assessment or fasting insulin. Diabetes 52, 2168–2174 (2003).

    Article  CAS  PubMed  Google Scholar 

  81. Cascorbi, I. Genetic basis of toxic reactions to drugs and chemicals. Toxicol. Lett. 162, 16–28 (2006).

    Article  CAS  PubMed  Google Scholar 

  82. Cortessis, V. & Thomas, D. C. in Mechanistic Considerations in the Molecular Epidemiology of Cancer (eds Bird, P., Boffetta, P., Buffler, P. & Rice, J.) 127–150 (IARC Scientific Publications, Lyon, France, 2003).

    Google Scholar 

  83. Thomas, D. C. Multistage sampling for latent variable models. Lifetime Data Anal. 13, 565–581 (2007).

    Article  PubMed  Google Scholar 

  84. Didelez, V. & Sheehan, N. Mendelian randomization as an instrumental variable approach to causal inference. Stat. Methods Med. Res. 16, 309–330 (2007).

    Article  PubMed  Google Scholar 

  85. Davey Smith, G. & Ebrahim, S. 'Mendelian randomization': can genetic epidemiology contribute to understanding environmental determinants of disease? Int. J. Epidemiol. 32, 1–22 (2003).

    Article  Google Scholar 

  86. Greenland, S. An introduction to instrumental variables for epidemiologists. Int. J. Epidemiol. 29, 722–729 (2000).

    Article  CAS  PubMed  Google Scholar 

  87. Dai, J. Y., LeBlanc, M. & Kooperberg, C. Semiparametric estimation exploiting covariate independence in two-phase randomized trials. Biometrics 65, 178–187 (2009).

    Article  PubMed  Google Scholar 

  88. McCarthy, M. I. et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nature Rev. Genet. 9, 356–369 (2008).

    Article  CAS  PubMed  Google Scholar 

  89. Altshuler, D., Daly, M. J. & Lander, E. S. Genetic mapping in human disease. Science 322, 881–888 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  90. Satagopan, J. M., Verbel, D. A., Venkatraman, E. S., Offit, K. E. & Begg, C. B. Two-stage designs for gene–disease association studies. Biometrics 58, 163–170 (2002).

    Article  PubMed  PubMed Central  Google Scholar 

  91. Wang, H., Thomas, D. C., Pe'er, I. & Stram, D. O. Optimal two-stage genotyping designs for genome-wide association scans. Genet. Epidemiol. 30, 356–368 (2006).

    Article  PubMed  Google Scholar 

  92. Skol, A. D., Scott, L. J., Abecasis, G. R. & Boehnke, M. Optimal designs for two-stage genome-wide association studies. Genet. Epidemiol. 31, 776–788 (2007).

    Article  PubMed  Google Scholar 

  93. Elston, R. C., Lin, D. & Zheng, G. Multistage sampling for genetic studies. Annu. Rev. Genomics Hum. Genet. 8, 327–342 (2007).

    Article  CAS  PubMed  Google Scholar 

  94. Thomas, D. C. et al. Methodological issues in multistage genome-wide association studies. Stat. Sci. Preprint at http://www.imstat.org/sts/future_papers.html (2009).

  95. Kooperberg, C. & Leblanc, M. Increasing the power of identifying gene × gene interactions in genome-wide association studies. Genet. Epidemiol. 32, 255–263 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  96. Marchini, J., Donnelly, P. & Cardon, L. R. Genome-wide strategies for detecting multiple loci that influence complex diseases. Nature Genet. 37, 413–417 (2005).

    Article  CAS  PubMed  Google Scholar 

  97. Evans, D. M., Marchini, J., Morris, A. P. & Cardon, L. R. Two-stage two-locus models in genome-wide association. PLoS Genet. 2, e157 (2006).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  98. Umbach, D. M. & Weinberg, C. R. Designing and analysing case–control studies to exploit independence of genotype and exposure. Stat. Med. 16, 1731–1743 (1997).

    Article  CAS  PubMed  Google Scholar 

  99. Murcray, C. E., Lewinger, J. P. & Gauderman, W. J. Gene–environment interaction in genome-wide association studies. Am. J. Epidemiol. 169, 219–226 (2009).

    Article  PubMed  Google Scholar 

  100. Pearson, J. V. et al. Identification of the genetic basis for complex disorders by use of pooling-based genomewide single-nucleotide-polymorphism association studies. Am. J. Hum. Genet. 80, 126–139 (2007).

    Article  CAS  PubMed  Google Scholar 

  101. Craig, D. W. et al. Identification of genetic variants using bar-coded multiplexed sequencing. Nature Methods 5, 887–893 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  102. Sham, P., Bader, J. S., Craig, I., O'Donovan, M. & Owen, M. DNA pooling: a tool for large-scale association studies. Nature Rev. Genet. 3, 862–871 (2002).

    Article  CAS  PubMed  Google Scholar 

  103. Cantor, R. M., Lange, K. & Sinsheimer, J. S. Prioritizing GWAS results: a review of statistical methods and recommendations for their application. Am. J. Hum. Genet. 86, 6–22 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  104. Roeder, K., Devlin, B. & Wasserman, L. Improving power in genome-wide association studies: weights tip the scale. Genet. Epidemiol. 31, 741–747 (2007).

    Article  PubMed  Google Scholar 

  105. Whittemore, A. S. A Bayesian false discovery rate for multiple testing. J. Appl. Stat. 34, 1–9 (2007).

    Article  Google Scholar 

  106. Wakefield, J. A Bayesian measure of the probability of false discovery in genetic epidemiology studies. Am. J. Hum. Genet. 81, 208–227 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  107. Wakefield, J. Reporting and interpretation in genome-wide association studies. Int. J. Epidemiol. 37, 641–653 (2008).

    Article  PubMed  Google Scholar 

  108. Datta, S. Empirical Bayes screening of many p-values with applications to microarray studies. Bioinformatics 21, 1987–1994 (2005).

    Article  CAS  PubMed  Google Scholar 

  109. Chen, G. K. & Witte, J. S. Enriching the analysis of genomewide association studies with hierarchical modeling. Am. J. Hum. Genet. 81, 397–404 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  110. Lewinger, J. P., Conti, D. V., Baurley, J. W., Triche, T. J. & Thomas, D. C. Hierarchical Bayes prioritization of marker associations from a genome-wide association scan for further investigation. Genet. Epidemiol. 31, 871–882 (2007).

    Article  PubMed  Google Scholar 

  111. Binder, H. & Schumacher, M. Incorporating pathway information into boosting estimation of high-dimensional risk prediction models. BMC Bioinformatics 10, 18 (2009).

    PubMed  PubMed Central  Google Scholar 

  112. Holden, M., Deng, S., Wojnowski, L. & Kulle, B. GSEA-SNP: applying gene set enrichment analysis to SNP data from genome-wide association studies. Bioinformatics 24, 2784–2785 (2008).

    Article  CAS  PubMed  Google Scholar 

  113. Elbers, C. C. et al. Using genome-wide pathway analysis to unravel the etiology of complex diseases. Genet. Epidemiol. 33, 419–431 (2009).

    Article  PubMed  Google Scholar 

  114. Baranzini, S. E. et al. Pathway and network-based analysis of genome-wide association studies in multiple sclerosis. Hum. Mol. Genet. 18, 2078–2090 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  115. Torkamani, A., Topol, E. J. & Schork, N. J. Pathway analysis of seven common diseases assessed by genome-wide association. Genomics 92, 265–272 (2008).

    Article  CAS  PubMed  Google Scholar 

  116. Lesnick, T. G. et al. A genomic pathway approach to a complex disease: axon guidance and Parkinson disease. PLoS Genet. 3, e98 (2007).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  117. Thomas, P. D. et al. A systems biology network model for genetic association studies of nicotine addiction and treatment. Pharmacogenet. Genomics 19, 538–551 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  118. Gieger, C. et al. Genetics meets metabolomics: a genome-wide association study of metabolite profiles in human serum. PLoS Genet. 4, e1000282 (2008).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  119. Friedman, N. Inferring cellular networks using probabilistic graphical models. Science 303, 799–805 (2004). An important paper that popularized the use of Bayesian network analysis for the reconstruction of gene networks from gene co-expression data.

    Article  CAS  PubMed  Google Scholar 

  120. Ramoni, R. B., Saccone, N. L., Hatsukami, D. K., Bierut, L. J. & Ramoni, M. F. A Testable prognostic model of nicotine dependence. J. Neurogenet. 23, 283–292 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  121. Ferrazzi, F., Sebastiani, P., Ramoni, M. F. & Bellazzi, R. Bayesian approaches to reverse engineer cellular systems: a simulation study on nonlinear Gaussian networks. BMC Bioinformatics 8, S2 (2007).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  122. Kohler, S., Bauer, S., Horn, D. & Robinson, P. N. Walking the interactome for prioritization of candidate disease genes. Am. J. Hum. Genet. 82, 949–958 (2008).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  123. Koch, L. G. & Britton, S. L. Development of animal models to test the fundamental basis of gene–environment interactions. Obesity (Silver Spring) 16, S28–S32 (2008).

    Article  CAS  Google Scholar 

  124. Gilliland, F. D., Li, Y. F., Saxon, A. & Diaz-Sanchez, D. Effect of glutathione-S-transferase M1 and P1 genotypes on xenobiotic enhancement of allergic responses: randomised, placebo-controlled crossover study. Lancet 363, 119–125 (2004). An excellent example of the use of experimental designs for investigating G×E interactions, in this case a randomized crossover challenge study of immunologic responses to diesel exhaust particles in allergic subjects.

    Article  CAS  PubMed  Google Scholar 

  125. Thomas, D. C. & Conti, D. V. Two stage genetic association studies. in Encyclopedia of Clinical Trials (eds D'Agostino, R., Sullivan, L. & Massaro, J.) (Wiley, New York, 2007).

    Google Scholar 

  126. Israel, E. et al. Use of regularly scheduled albuterol treatment in asthma: genotype-stratified, randomised, placebo-controlled cross-over trial. Lancet 364, 1505–1512 (2004).

    Article  CAS  PubMed  Google Scholar 

  127. Davis, B. R. et al. Imputing gene–treatment interactions when the genotype distribution is unknown using case-only and putative placebo analyses — a new method for the Genetics of Hypertension Associated Treatment (GenHAT) study. Stat. Med. 23, 2413–2427 (2004).

    Article  PubMed  Google Scholar 

  128. Vittinghoff, E. & Bauer, D. C. Case-only analysis of treatment–covariate interactions in clinical trials. Biometrics 62, 769–776 (2006).

    Article  CAS  PubMed  Google Scholar 

  129. Lin, B. K. et al. Tracking the epidemiology of human genes in the literature: the HuGE Published Literature database. Am. J. Epidemiol. 164, 1–4 (2006).

    Article  PubMed  Google Scholar 

  130. Khoury, M. J. & Little, J. Human genome epidemiologic reviews: the beginning of something HuGE. Am. J. Epidemiol. 151, 2–3 (2000).

    Article  CAS  PubMed  Google Scholar 

  131. Yesupriya, A. et al. Reporting of human genome epidemiology (HuGE) association studies: an empirical assessment. BMC Med. Res. Methodol. 8, 31 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  132. Jensen, L. J., Saric, J. & Bork, P. Literature mining for the biologist: from information retrieval to biological discovery. Nature Rev. Genet. 7, 119–129 (2006).

    Article  CAS  PubMed  Google Scholar 

  133. Raychaudhuri, S. et al. Identifying relationships among genomic disease regions: predicting genes at pathogenic SNP associations and rare deletions. PLoS Genet. 5, e1000534 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  134. Gene Ontology Consortium. The Gene Ontology (GO) project in 2006. Nucleic Acids Res. 34, D322–D326 (2006).

  135. Kanehisa, M. et al. KEGG for linking genomes to life and the environment. Nucleic Acids Res. 36, D480–D484 (2008).

    Article  CAS  PubMed  Google Scholar 

  136. Thomas, P. D. et al. PANTHER: a library of protein families and subfamilies indexed by function. Genome Res. 13, 2129–2141 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  137. Miller, R. L. & Ho, S. M. Environmental epigenetics and asthma: current concepts and call for studies. Am. J. Respir. Crit. Care Med. 177, 567–573 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  138. Salk, J. J., Fox, E. J. & Loeb, L. A. Mutational heterogeneity in human cancers: origin and consequences. Annu. Rev. Pathol. 5, 51–75 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  139. Zeisel, S. H. Epigenetic mechanisms for nutrition determinants of later health outcomes. Am. J. Clin. Nutr. 89, 1488S–1493S (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  140. Perera, F. et al. Relation of DNA methylation of 5′-CpG island of ACSL3 to transplacental exposure to airborne polycyclic aromatic hydrocarbons and childhood asthma. PLoS ONE 4, e4488 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  141. Baccarelli, A. et al. Rapid DNA methylation changes after exposure to traffic particles. Am. J. Respir. Crit. Care Med. 179, 572–578 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  142. Fraga, M. F. et al. Epigenetic differences arise during the lifetime of monozygotic twins. Proc. Natl Acad. Sci. USA 102, 10604–10609 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  143. Stranger, B. E. et al. Genome-wide associations of gene expression variation in humans. PLoS Genet. 1, e78 (2005).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  144. Manolio, T. A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  145. Zhu, X., Feng, T., Li, Y., Lu, Q. & Elston, R. C. Detecting rare variants for complex traits using family and unrelated data. Genet. Epidemiol. 34, 171–187 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  146. Siva, N. 1000 Genomes project. Nature Biotech. 26, 256 (2008).

    Article  Google Scholar 

  147. Cullen, A. C., Corrales, M. A., Kramer, C. B. & Faustman, E. M. The application of genetic information for regulatory standard setting under the clean air act: a decision-analytic approach. Risk Anal. 28, 877–890 (2008).

    Article  PubMed  Google Scholar 

  148. Shostak, S. Locating gene–environment interaction: at the intersections of genetics and public health. Soc. Sci. Med. 56, 2327–2342 (2003).

    Article  PubMed  Google Scholar 

  149. Need, A. C., Motulsky, A. G. & Goldstein, D. B. Priorities and standards in pharmacogenetic research. Nature Genet. 37, 671–681 (2005).

    Article  CAS  PubMed  Google Scholar 

  150. Lave, L. B. & Omenn, G. S. Clearing The Air: Reforming The Clean Air Act (Brookings Institution, Washington, DC, 1981).

    Google Scholar 

  151. Rose, G. The Strategy Of Preventive Medicine (Oxford Univ. Press, 1992).

    Google Scholar 

  152. Bernstein, J. L. et al. Radiation-induced second primary breast cancer and BRCA1 and BRCA2 mutation carrier status: a report from the WECARE Study. J. Natl Cancer Inst. (in the press).

  153. Perera, F. P. Molecular epidemiology: on the path to prevention? J. Natl Cancer Inst. 92, 602–612 (2000).

    Article  CAS  PubMed  Google Scholar 

  154. Feng, D. et al. Platelet glycoprotein IIIa PlA polymorphism, fibrinogen, and platelet aggregability: The Framingham Heart Study. Circulation 104, 140–144 (2001).

    Article  CAS  PubMed  Google Scholar 

  155. He, C., Tamimi, R. M., Hankinson, S. E., Hunter, D. J. & Han, J. A prospective study of genetic polymorphism in MPO, antioxidant status, and breast cancer risk. Breast Cancer Res. Treat. 113, 585–594 (2009).

    Article  CAS  PubMed  Google Scholar 

  156. Bureau, A., Diallo, M. S., Ordovas, J. M. & Cupples, L. A. Estimating interaction between genetic and environmental risk factors: efficiency of sampling designs within a cohort. Epidemiology 19, 83–93 (2008).

    Article  PubMed  Google Scholar 

  157. Jugessur, A. et al. Cleft palate, transforming growth factor alpha gene variants, and maternal exposures: assessing gene–environment interactions in case–parent triads. Genet. Epidemiol. 25, 367–374 (2003).

    Article  PubMed  Google Scholar 

  158. Mayer, E. J. et al. Genetic and environmental influences on insulin levels and the insulin resistance syndrome: an analysis of women twins. Am. J. Epidemiol. 143, 323–332 (1996).

    Article  CAS  PubMed  Google Scholar 

  159. Bernstein, J. L. et al. Radiation exposure, the ATM gene, and risk of bilateral breast cancer in the WECARE study. J. Natl Cancer Inst. (in the press).

  160. Gilliland, F. D. et al. Effects of glutathione S-transferase M1, maternal smoking during pregnancy, and environmental tobacco smoke on asthma and wheezing in children. Am. J. Respir. Crit. Care Med. 166, 457–463 (2002).

    Article  PubMed  Google Scholar 

  161. Martinez, F. D. Gene–environment interactions in asthma: with apologies to William of Ockham. Proc. Am. Thorac. Soc. 4, 26–31 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  162. Gianfagna, F., De Feo, E., van Duijn, C. M., Ricciardi, G. & Boccia, S. A systematic review of meta-analyses on gene polymorphisms and gastric cancer risk. Curr. Genomics 9, 361–374 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  163. Siemiatycki, J. & Thomas, D. C. Biological models and statistical interactions: an example from multistage carcinogenesis. Int. J. Epidemiol. 10, 383–387 (1981).

    Article  CAS  PubMed  Google Scholar 

  164. Greenland, S. Interactions in epidemiology: relevance, identification, and estimation. Epidemiology 20, 14–17 (2009).

    Article  PubMed  Google Scholar 

  165. Haldane, J. B. S. Heredity and Politics (W. W. Norton, New York, 1938).

    Google Scholar 

  166. Ottman, R. An epidemiologic approach to gene–environment interaction. Genet. Epidemiol. 7, 177–185 (1990). This widely quoted paper was one of the first to offer a classification of different types of G×E interactions, and gives classic examples of each type.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  167. Lewontin, R. C. Annotation: the analysis of variance and the analysis of causes. Am. J. Hum. Genet. 26, 400–411 (1974).

    CAS  PubMed  PubMed Central  Google Scholar 

  168. Garcia-Closas, M. et al. NAT2 slow acetylation, GSTM1 null genotype, and risk of bladder cancer: results from the Spanish Bladder Cancer Study and meta-analyses. Lancet 366, 649–659 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  169. Dearfield, K. L., Benson, W. H., Gallagher, K. & Johnson, J. D. in Genomics and Environmental Regulation: Science, Ethics, and Law (eds Sharp, R. R., Marchant, G. E. & Grodsky, J. A.) 25–34 (Johns Hopkins Univ. Press, Baltimore, 2009).

    Google Scholar 

  170. Lympany, P. A. et al. HLA-DPB polymorphisms: Glu 69 association with sarcoidosis. Eur. J. Immunogenet. 23, 353–359 (1996).

    Article  CAS  PubMed  Google Scholar 

  171. Jacobi, C. E., Nagelkerke, N. J., van Houwelingen, J. H. & de Bock, G. H. Breast cancer screening, outside the population-screening program, of women from breast cancer families without proven BRCA1/BRCA2 mutations: a simulation study. Cancer Epidemiol. Biomarkers Prev. 15, 429–436 (2006).

    Article  PubMed  Google Scholar 

  172. Ulrich, C. M. & Potter, J. D. Folate supplementation: too much of a good thing? Cancer Epidemiol. Biomarkers Prev. 15, 189–193 (2006).

    PubMed  Google Scholar 

Download references

Acknowledgements

Supported in part by grants 5P30 ES007043, 1U01 ES15090 and 1R01 ES016813 from the US National Institute of Environmental Health Sciences. The author is grateful to D. Conti, W. J. Gauderman, F. Gilliland and R. Haile for many helpful suggestions.

Author information

Authors and Affiliations

Authors

Ethics declarations

Competing interests

The author declares no competing financial interests.

Supplementary information

Supplementary Figure S1

Sample-size requirements for gene–environment interactions. (PDF 240 kb)

Related links

Related links

FURTHER INFORMATION

Duncan Thomas's homepage

Human Genome Epidemiology Network (HuGENet)

Nature Reviews Genetics series on Genome-wide association studies

POWER

Quanto

Glossary

Marginal effects

The effects of a specific risk factor (gene or exposure) in the population as a whole, averaging over all other variables.

Genome-wide association study

A scan of the entire genome for association with a disease or trait using a standard panel of 500,000 to 1 million haplotype-tagging SNPs.

Gene–environment-wide interaction study

A scan of the entire genome for interactions with various environmental exposures.

Ecologic-level study

An observational epidemiology study that relies on comparisons of aggregate disease rates across groups in relation to aggregate exposure information rather than comparisons between individuals.

Interaction odds ratio

The ratio of odds ratios for the relationship of one factor (for example, a gene) with disease across the levels of another factor (for example, an environmental exposure); as such, it is a measure of departure from a multiplicative joint effect.

Confounder

A spurious association between a risk factor (a gene, exposure or interaction) and disease induced by the joint associations of some other variable with the risk factor and the disease that are independent of the risk factor. Confounding can also distort the magnitude of the association of a true risk factor with disease or mask it.

Gene–environment independence

The independent distribution of genotype and environment in the source population.

Empirical Bayes

A technique for estimating the effects of each component of a large ensemble of related variables by assuming the ensemble has some common distribution and estimating the parameters of that distribution. Empirical Bayes estimators typically have better prediction error than estimating each one separately.

Bayes model averaging

A technique for accounting for uncertainty about the correct model form (for example, the selection of variables to include in a multiple regression model) by averaging the effects of each possible variable over the set of all plausible models.

Modified segregation analysis

This analysis applies likelihood-based methods to data from a pedigree in which one or more members have genotypes available at a major gene. It derives the genotypes of untyped individuals by summing their conditional genotype probabilities using the genotypes available.

Population stratification

The phenomenon of an apparently homogeneous population that is actually composed of subgroups of individuals with distinct ancestral origins and differing allele frequencies at many loci. This leads to bias in the assessment of the significance of associations of a trait with particular loci.

Joint segregation and linkage analysis

The use of family studies to estimate the parameters of a penetrance model. The parameters could include interactions between the unobserved major gene, which is linked to a marker, and environmental factors.

Multiple regression

A standard statistical technique for relating a single outcome variable to multiple explanatory variables, either all at once or using some variable selection method, such as stepwise forward selection or backward elimination.

Machine learning

Any of many data analysis techniques for mining large data sets derived from the computer science field. The techniques are not specifically based on mathematical statistics theory.

Pattern recognition

Any technique from exploratory data analysis or machine learning for discovering non-random patterns in large data sets.

First-level coefficients

In a hierarchical model, the regression coefficients (for example, log relative risks for each variable) for the subject-level data on the association between risk factors and disease. Unlike a non-hierarchical model, these coefficients are treated as random variables with distributions described in the higher level(s) of the model rather than as model parameters to be estimated directly.

Pathway indicator variables

Various types of information that can be used as predictor variables in the higher levels of a hierarchical model, specifically binary variables that indicate whether a particular gene or interaction has a role in a particular pathway.

Ontology

A formal system for organizing knowledge, here used in the context of biological pathways as a means of synthesizing information about the function of genes and exposures and their joint roles in disease causation.

Reverse causation

A bias in the estimation of the causal effect of a biomarker on disease when biospecimens are obtained after diagnosis. The bias occurs because the disease or its treatment alters the underlying intermediate variable or the measurement of it.

Mendelian randomization

A technique for studying the relationship between a biomarker and disease indirectly by studying the relationship of each to a gene that influences the biomarker.

Instrumental variable

In statistics, a variable that can be used to predict the value of an explanatory variable that is measured with error. The instrumental variable thereby indirectly yields an unbiased estimate of the relationship of the explanatory variable with an outcome variable.

Multiple comparisons penalty

The higher degree of statistical significance that is required for a particular association to be considered noteworthy when many possible associations are analysed simultaneously. Several adjustment methods can take account of this penalty, the best known of which is the Bonferroni correction.

Bonferroni correction

A multiple comparisons adjustment for testing at a conventional significance level. It is based on multiplying the p value for a specific test by the total number of tests performed, and approximately controls the overall type I error rate (the probability of at least one false positive association) at the chosen significance level if the predictors are independent.

DNA bar-coding

The addition of a unique molecular tag to each fragment of an individual's DNA so that after pooling with other DNA samples, the genotype of each individual in the pool can be reconstructed.

Coherence

The extent to which the data at hand is concordant with other types of biological knowledge, thereby reinforcing a causal interpretation.

False discovery rate

This controls the proportion of all reported positive associations that are expected to be false positives, and can be used to judge which of many associations are noteworthy.

Bayesian network analysis

A technique for developing a minimal graphical representation of the connections among a large set of variables by examining the conditional independence relationships among pairs of variables given the other variables connected to them within the graph. This technique has been widely used for the analysis of gene co-expression data.

Challenge studies

Various experimental designs for assessing the effects of a noxious agent by exposing individuals to trace amounts in a controlled setting (as in a randomized or crossover trial). For gene–environment interaction studies, the effects can be compared across subgroups with different genotypes, and the efficiency can be improved by stratified sampling based on genotype.

Latent variable models

A model involving one or more unobservable intermediate variables that represent the pathway connecting a cause (for example, exposures and genotypes) to an effect (for example, disease). Identifying the pathways typically requires the use of surrogates for the latent variables (for example, biomarkers) in addition to the observable cause and effect variables.

1000 Genomes Project

A large-scale effort to obtain and catalogue the full genome-wide DNA sequence of 1,000 individuals selected from a range of races.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Thomas, D. Gene–environment-wide association studies: emerging approaches. Nat Rev Genet 11, 259–272 (2010). https://doi.org/10.1038/nrg2764

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg2764

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing