Article Text

Download PDFPDF

The mathematical limits of genetic prediction for complex chronic disease
  1. Katherine M Keyes1,
  2. George Davey Smith2,
  3. Karestan C Koenen1,
  4. Sandro Galea3
  1. 1Department of Epidemiology, Mailman School of Public Health, Columbia University, New York, New York, USA
  2. 2MRC/University of Bristol Integrative Epidemiology Unit (IEU), Bristol, UK
  3. 3Boston University School of Public Health, Boston, MA, USA
  1. Correspondence to Dr Katherine M Keyes, Department of Epidemiology, Columbia University, 722 West 168th Street, #503, New York, NY 10032, USA; kmk2104{at}


Background Attempts at predicting individual risk of disease based on common germline genetic variation have largely been disappointing. The present paper formalises why genetic prediction at the individual level is and will continue to have limited utility given the aetiological architecture of most common complex diseases.

Methods Data were simulated on one million populations with 10 000 individuals in each populations with varying prevalences of a genetic risk factor, an interacting environmental factor and the background rate of disease. The determinant risk ratio and risk difference magnitude for the association between a gene variant and disease is a function of the prevalence of the interacting factors that activate the gene, and the background rate of disease.

Results The risk ratio and total excess cases due to the genetic factor increase as the prevalence of interacting factors increase, and decrease as the background rate of disease increases. Germline genetic variations have high predictive capacity for individual disease only under conditions of high heritability of particular genetic sequences, plausible only under rare variant hypotheses.

Conclusions Under a model of common germline genetic variants that interact with other genes and/or environmental factors in order to cause disease, the predictive capacity of common genetic variants is determined by the prevalence of the factors that interact with the variant and the background rate. A focus on estimating genetic associations for the purpose of prediction without explicitly grounding such work in an understanding of modifiable (including environmentally influenced) factors will be limited in its ability to yield important insights about the risk of disease.

  • Gene environment interactions

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.