Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Analysis
  • Published:

Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies

Abstract

We report a new method to estimate the predictive performance of polygenic models for risk prediction and assess predictive performance for ten complex traits or common diseases. Using estimates of effect-size distribution and heritability derived from current studies, we project that although 45% of the variance of height has been attributed to SNPs, a model trained on one million people may only explain 33.4% of variance of the trait. Models based on current studies allow for identification of 3.0%, 1.1% and 7.0% of the populations at twofold or higher than average risk for type 2 diabetes, coronary artery disease and prostate cancer, respectively. Tripling of sample sizes could elevate these percentages to 18.8%, 6.1% and 12.2%, respectively. The utility of polygenic models for risk prediction will depend on achievable sample sizes for the training data set, the underlying genetic architecture and the inclusion of information on other risk factors, including family history.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: PCC for polygenic models and corresponding optimal significance level for SNP selection under three models for polygenic architectures for adult height.
Figure 2: Expected PCC for polygenic models at optimal significance level for SNP selection for four quantitative traits.
Figure 3: Expected AUC statistics at optimal significance level for SNP selection for five disease traits.

Similar content being viewed by others

References

  1. Bowles Biesecker, B. & Marteau, T.M. The future of genetic counselling: an international perspective. Nat. Genet. 22, 133–137 (1999).

    Article  CAS  Google Scholar 

  2. Pharoah, P.D. et al. Polygenic susceptibility to breast cancer and implications for prevention. Nat. Genet. 31, 33–36 (2002).

    Article  CAS  Google Scholar 

  3. van Hoek, M. et al. Predicting type 2 diabetes based on polymorphisms from genome-wide association studies: a population-based study. Diabetes 57, 3122–3128 (2008).

    Article  CAS  Google Scholar 

  4. Pharoah, P.D., Antoniou, A.C., Easton, D.F. & Ponder, B.A. Polygenes, risk prediction, and targeted prevention of breast cancer. N. Engl. J. Med. 358, 2796–2803 (2008).

    Article  CAS  Google Scholar 

  5. Wacholder, S. et al. Performance of common genetic variants in breast-cancer risk models. N. Engl. J. Med. 362, 986–993 (2010).

    Article  CAS  Google Scholar 

  6. Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832–838 (2010).

    Article  CAS  Google Scholar 

  7. Speliotes, E.K. et al. Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nat. Genet. 42, 937–948 (2010).

    Article  CAS  Google Scholar 

  8. Teslovich, T.M. et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466, 707–713 (2010).

    Article  CAS  Google Scholar 

  9. Jostins, L. & Barrett, J.C. Genetic risk prediction in complex disease. Hum. Mol. Genet. 20, R182–R188 (2011).

    Article  CAS  Google Scholar 

  10. Franke, A. et al. Genome-wide meta-analysis increases to 71 the number of confirmed Crohn's disease susceptibility loci. Nat. Genet. 42, 1118–1125 (2010).

    Article  CAS  Google Scholar 

  11. Kraft, P. & Hunter, D.J. Genetic risk prediction–are we there yet? N. Engl. J. Med. 360, 1701–1703 (2009).

    Article  CAS  Google Scholar 

  12. Manolio, T.A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009).

    Article  CAS  Google Scholar 

  13. Zuk, O., Hechter, E., Sunyaev, S.R. & Lander, E.S. The mystery of missing heritability: Genetic interactions create phantom heritability. Proc. Natl. Acad. Sci. USA 109, 1193–1198 (2012).

    Article  CAS  Google Scholar 

  14. Park, J.H. et al. Estimation of effect size distribution from genome-wide association studies and implications for future discoveries. Nat. Genet. 42, 570–575 (2010).

    Article  CAS  Google Scholar 

  15. Park, J.H. et al. Distribution of allele frequencies and effect sizes and their interrelationships for common genetic susceptibility variants. Proc. Natl. Acad. Sci. USA 108, 18026–18031 (2011).

    Article  CAS  Google Scholar 

  16. Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).

    Article  CAS  Google Scholar 

  17. Yang, J. et al. Genome partitioning of genetic variation for complex traits using common SNPs. Nat. Genet. 43, 519–525 (2011).

    Article  CAS  Google Scholar 

  18. Park, J.H. & Dunson, D.B. Bayesian generalized product partition model. Statist. Sinica 20, 1203–1226 (2010).

    Google Scholar 

  19. Lee, S.H. et al. Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs. Nat. Genet. 44, 247–250 (2012).

    Article  CAS  Google Scholar 

  20. Stahl, E.A. et al. Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis. Nat. Genet. 44, 483–489 (2012).

    Article  CAS  Google Scholar 

  21. Vattikuti, S., Guo, J. & Chow, C.C. Heritability and Genetic Correlations Explained by Common SNPs for Metabolic Syndrome Traits. PLoS Genet. 8, e1002637 (2012).

    Article  CAS  Google Scholar 

  22. Purcell, S.M. et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460, 748–752 (2009).

    Article  CAS  Google Scholar 

  23. Clayton, D.G. Prediction and interaction in complex disease genetics: experience in type 1 diabetes. PLoS Genet. 5, e1000540 (2009).

    Article  Google Scholar 

  24. Wray, N.R., Goddard, M.E. & Visscher, P.M. Prediction of individual genetic risk to disease from genome-wide association studies. Genome Res. 17, 1520–1528 (2007).

    Article  CAS  Google Scholar 

  25. Daetwyler, H.D., Villanueva, B. & Woolliams, J.A. Accuracy of predicting the genetic risk of disease using a genome-wide approach. PLoS ONE 3, e3395 (2008).

    Article  Google Scholar 

  26. Janssens, A.C. et al. Predictive testing for complex diseases using multiple genes: fact or fiction? Genet. Med. 8, 395–400 (2006).

    Article  Google Scholar 

  27. Mihaescu, R., Moonesinghe, R., Khoury, M.J. & Janssens, A.C. Predictive genetic testing for the identification of high-risk groups: a simulation study on the impact of predictive ability. Genome Med. 3, 51 (2011).

    Article  Google Scholar 

  28. Roberts,, N.J. et al. The predictive capacity of personal genome sequencing. Sci Transl. Med. 4, 133ra58 (2012).

    Article  Google Scholar 

  29. Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Series B Stat. Methodol. 58, 267–288 (1996).

    Google Scholar 

  30. Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375 (2012).

    Article  CAS  Google Scholar 

  31. Goddard, M.E., Wray, N.R., Verbyla, K. & Visscher, P.M. Estimating effects and making predictions from genome-wide marker data. Stat. Sci. 24, 517–529 (2009).

    Article  Google Scholar 

  32. Guan, Y. & Stephens, M. Bayesian variable selection regression for genome-wide association studies and other large-scale problems. Ann. Appl. Stat. 5, 1780–1815 (2011).

    Article  Google Scholar 

  33. Li, B. & Leal, S.M. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am. J. Hum. Genet. 83, 311–321 (2008).

    Article  CAS  Google Scholar 

  34. Gail, M.H. Personalized estimates of breast cancer risk in clinical practice and public health. Stat. Med. 30, 1090–1104 (2011).

    Article  Google Scholar 

  35. Lee, S.H., Wray, N.R., Goddard, M.E. & Visscher, P.M. Estimating missing heritability for disease from genome-wide association studies. Am. J. Hum. Genet. 88, 294–305 (2011).

    Article  Google Scholar 

  36. Barrett, J.C. et al. Genome-wide association study and meta-analysis find that over 40 loci affect risk of type 1 diabetes. Nat. Genet. 41, 703–707 (2009).

    Article  CAS  Google Scholar 

  37. Voight, B.F. et al. Twelve type 2 diabetes susceptibility loci identified through large-scale association analysis. Nat. Genet. 42, 579–589 (2010).

    Article  CAS  Google Scholar 

  38. Eeles, R.A. et al. Identification of seven new prostate cancer susceptibility loci through a genome-wide association study. Nat. Genet. 41, 1116–1121 (2009).

    Article  CAS  Google Scholar 

  39. Schunkert, H. et al. Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nat. Genet. 43, 333–338 (2011).

    Article  CAS  Google Scholar 

  40. Scheuner, M.T. Genetic evaluation for coronary artery disease. Genet. Med. 5, 269–285 (2003).

    Article  Google Scholar 

  41. Mai, P.L., Wideroff, L., Greene, M.H. & Graubard, B.I. Prevalence of family history of breast, colorectal, prostate, and lung cancer in a population-based study. Public Health Genomics 13, 495–503 (2010).

    Article  CAS  Google Scholar 

  42. Annis, A.M., Caulder, M.S., Cook, M.L. & Duquette, D. Family history, diabetes, and other demographic and risk factors among participants of the National Health and Nutrition Examination Survey 1999–2002. Prev. Chronic Dis. 2, A19 (2005).

    PubMed  PubMed Central  Google Scholar 

  43. Wray, N.R., Yang, J., Goddard, M.E. & Visscher, P.M. The genetic interpretation of area under the ROC curve in genomic profiling. PLoS Genet. 6, e1000864 (2010).

    Article  Google Scholar 

  44. So, H.C., Kwan, J.S., Cherny, S.S. & Sham, P.C. Risk prediction of complex diseases from family history and known susceptibility loci, with applications for cancer screening. Am. J. Hum. Genet. 88, 548–565 (2011).

    Article  CAS  Google Scholar 

  45. Park, J.H., Gail, M.H., Greene, M.H. & Chatterjee, N. Potential usefulness of single nucleotide polymorphisms to identify persons at high cancer risk: an evaluation of seven common cancers. J. Clin. Oncol. 30, 2157–2162 (2012).

    Article  Google Scholar 

  46. Ghosh, A., Zou, F. & Wright, F.A. Estimating odds ratios in genome scans: an approximate conditional likelihood approach. Am. J. Hum. Genet. 82, 1064–1074 (2008).

    Article  CAS  Google Scholar 

  47. Spiegelhalter, D.J., Best, N.G., Carlin, B.R. & van der Linde, A. Bayesian measures of model complexity and fit. J. R. Stat. Soc. Series B Stat. Methodol. 64, 583–616 (2002).

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported by the intramural program of the US National Cancer Institute.

Author information

Authors and Affiliations

Authors

Contributions

N.C. led the development of the statistical methods and drafted the manuscript. J.-H.P. contributed to the development of the methods and performed the illustrative analyses. B.W. implemented simulation studies. J.S., P.H. and S.J.C. contributed to designs of various analyses and interpretation of results. N.C., B.W., J.S., P.H., S.J.C. and J.-H.P. reviewed and revised the manuscript.

Corresponding author

Correspondence to Nilanjan Chatterjee.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Tables 1–4, Supplementary Figures 1–3, Supplementary Note (PDF 329 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chatterjee, N., Wheeler, B., Sampson, J. et al. Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies. Nat Genet 45, 400–405 (2013). https://doi.org/10.1038/ng.2579

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng.2579

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing