Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Mathematical multi-locus approaches to localizing complex human trait genes

Key Points

  • Statistical gene mapping allows the approximate localization of disease susceptibility genes on the human gene map in the absence of any functional knowledge of such genes.

  • After a brief introduction to conventional mapping methods, we focus on the mapping of complex heritable traits, such as diabetes and schizophrenia, and outline the known statistical methods that specifically address the multi-locus nature of complex traits.

  • These methods require sophisticated statistical approaches to allow for the consideration of multiple genetic marker loci and their combined effects on disease, while at the same time keeping the overall rate of false-positive results at an acceptably low level.

  • The development of statistical multi-locus methods for gene mapping has only recently begun on a larger scale; statistical properties, such as power under different scenarios, still need to be explored. In addition, little is known about the actions and interactions of genes that underlie complex traits, although such genes are likely to exist.

  • Extreme cases of complex traits might only be due to interactions between genes such that each single gene does not exert an effect by itself. It is shown that even here statistical approaches can come up with useful results, but the required computational effort might be higher than the capacity that is available at present.

  • Many of these methods have been incorporated in computer programs that represent delicate and sensitive tools for the hands of specialists. If used without the proper background knowledge, they can lead to misinterpretation of data.

Abstract

Statistical analysis methods for gene mapping originated in counting recombinant and non-recombinant offspring, but have now progressed to sophisticated approaches for the mapping of complex trait genes. Here, we outline new statistical methods that capture the simultaneous effects of multiple gene loci and thereby achieve a more global view of gene action and interaction than is possible by traditional gene-by-gene analysis. We aim to show that the work of statisticians goes far beyond the running of computer programs.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Expected decay of linkage disequilibrium around a recessive disease locus in successive generations after an initial mutation.
Figure 2: Flow chart for the set association procedure for combining association effects of multiple marker loci.

Similar content being viewed by others

References

  1. Venter, C. Presentation given at the Annual Short Course in Medical and Experimental Mammalian Genetics in Bar Harbor, July 16–27, 2001.

  2. Templeton, A. R., Weiss, K. M., Nickerson, D. A., Boerwinkle, E. & Sing, C. F. Cladistic structure within the human lipoprotein lipase gene and its implications for phenotypic association studies. Genetics 156, 1259–1275 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  3. Doerge, R. W. Mapping and analysis of quantitative trait loci in experimental populations. Nature Rev. Genet. 3, 43–52 (2002). A review of analysis methods for mapping quantitative trait loci (QTLs). Many of the methods can also be applied to other biological data sets for correlating quantitative phenotypes with genotypes.

    Article  CAS  Google Scholar 

  4. Garrod, A. E. The incidence of alcaptonuria: a study in chemical individuality. Lancet II, 1616–1620 (1902).

    Article  Google Scholar 

  5. Morton, N. E. Sequential tests for the detection of linkage. Am. J. Hum. Genet. 7, 277–318 (1955). The original paper proposing the lod score analysis for human linkage studies.

    CAS  PubMed  PubMed Central  Google Scholar 

  6. Elston, R. C. & Stewart, J. A general model for the analysis of pedigree data. Hum. Hered. 21, 523–542 (1971). The landmark paper describing what is known as the Elston–Stewart algorithm for the genetic analysis of large, extended pedigree data.

    Article  CAS  Google Scholar 

  7. Ott, J. Estimation of the recombination fraction in human pedigrees: efficient computation of the likelihood for human linkage studies. Am. J. Hum. Genet. 26, 588–597 (1974).

    CAS  PubMed  PubMed Central  Google Scholar 

  8. Ott, J. et al. Linkage studies in a large kindred with familial hypercholesterolemia. Am. J. Hum. Genet. 26, 598–603 (1974). The first application of the lod score method in a large human kindred allowing for age-dependent penetrance that led to identification of the gene that is responsible for familial hypercholesterolaemia.

    CAS  PubMed  PubMed Central  Google Scholar 

  9. Berg, K. & Heiberg, A. Linkage between familial hypercholesterolemia with xanthomatosis and the C3 polymorphism confirmed. Cytogenet. Cell. Genet. 22, 621–623 (1978).

    Article  CAS  Google Scholar 

  10. Gusella, J. A polymorphic DNA marker genetically linked to Huntington's disease. Nature 306, 234–238 (1983).

    Article  CAS  Google Scholar 

  11. Tsui, L. C. et al. Cystic fibrosis locus defined by a genetically linked polymorphic DNA marker. Science 230, 1054–1057 (1985). This work, together with their 1989 paper in Science , represents the earliest triumph in genetic linkage analysis with DNA markers (restriction fragment length polymorphisms, RFLPs) followed by molecular positional cloning. It assigned the cystic fibrosis (CF) locus to the long arm of chromosome 7 (7q31) and identified the CF transmembrane regulator ( CFTR ) as the disease gene.

    Article  CAS  Google Scholar 

  12. Cardon, L. R. & Bell, J. I. Association study designs for complex diseases. Nature Rev. Genet. 2, 91–99 (2001). The authors review all association studies conducted so far and discuss some crucial issues in study designs.

    Article  CAS  Google Scholar 

  13. Ardlie, K. G., Kruglyak, L. & Seielstad, M. Patterns of linkage disequilibrium in the human genome. Nature Rev. Genet. 3, 299–309 (2002).

    Article  CAS  Google Scholar 

  14. Génin, E., Todorov, A. A. and Clerget-Darpoux, F. Optimization of genome search strategies for homozygosity mapping: influence of marker spacing on power and threshold criteria for identification of candidate regions. Ann. Hum. Genet. 62, 419–429 (1998).

    Article  Google Scholar 

  15. Reich, D. E. et al. Linkage disequilibrium in the human genome. Nature 411, 199–204 (2001).

    Article  CAS  Google Scholar 

  16. Risch, N. Searching for genetic determinants in the new millennium. Nature 405, 847–856 (2000).

    Article  CAS  Google Scholar 

  17. Bellman, R. Adaptive Control Processes: a Guided Tour (Princeton University Press, Princeton, 1961).

    Book  Google Scholar 

  18. Hoh, J. et al. Selecting SNPs in two-stage analysis of disease association data: a model-free approach. Ann. Hum. Genet. 64, 413–417 (2000).

    Article  CAS  Google Scholar 

  19. Ott, J. Analysis of Human Genetic Linkage (Johns Hopkins University Press, Baltimore, USA, 1999).

    Google Scholar 

  20. Hogben, L. The genetic analysis of familial traits. II. Double gene substitutions, with special reference to hereditary dwarfism. J. Genet. 25, 211–240 (1932).

    Article  Google Scholar 

  21. MacLean, C. J., Sham, P. C. & Kendler, K. S. Joint linkage of multiple loci for a complex disorder. Am. J. Hum. Genet. 53, 353–366 (1993).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. Cox, N. J. et al. Loci on chromosomes 2 (NIDDM1) and 15 interact to increase susceptibility to diabetes in Mexican Americans. Nature Genet. 21, 213–215 (1999).

    Article  CAS  Google Scholar 

  23. Schork, N. J., Boehnke, M., Terwilliger, J. D. & Ott, J. Two-trait-locus linkage analysis: a powerful strategy for mapping complex genetic traits. Am. J. Hum. Genet. 53, 1127–1136 (1993).

    CAS  PubMed  PubMed Central  Google Scholar 

  24. Knapp, M., Seuchter, S. A. & Baur, M. P. Two-locus disease models with two marker loci: the power of affected-sib-pair tests. Am. J. Hum. Genet. 55, 1030–1041 (1994).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. Fan, R., Floros, J. & Xiong, M. Transmission disequilibrium test of two unlinked disease loci; application to respiratory distress syndrome. Adv. Appl. Stat. 1, 277–308 (2001).

    Google Scholar 

  26. Dupuis, J., Brown, P. O. & Siegmund, D. Statistical methods for linkage analysis of complex traits from high-resolution maps of identity by descent. Genetics 140, 843–856 (1995). The first rigorous theoretical work that compares single-locus search, simultaneous search and conditional search for the mapping of a trait caused by two susceptibility genes.

    CAS  PubMed  PubMed Central  Google Scholar 

  27. Cordell, H. J., Wedig, G. C., Jacobs, K. B. & Elston, R. C. Multilocus linkage tests based on affected relative pairs. Am. J. Hum. Genet. 66, 1273–1286 (2000).

    Article  CAS  Google Scholar 

  28. Cruickshanks, K. J. et al. Genetic marker associations with proliferative retinopathy in persons diagnosed with diabetes before 30 yr of age. Diabetes 41, 879–85 (1992).

    Article  CAS  Google Scholar 

  29. Felsenfeld, S. & Plomin, R. Epidemiological and offspring analyses of developmental speech disorders using data from the Colorado Adoption Project. J. Speech Lang. Hear. Res. 40, 778–791 (1997).

    Article  CAS  Google Scholar 

  30. Rao, C. R. & Wu, Y. in Model Selection (ed. Lahiri, P.) 1–57 (IMS Lecture Notes Monograph Series, Volume 38, Institute of Mathematical Statistics, Beachwood, Ohio, USA, 2001).

    Book  Google Scholar 

  31. Lucek, P. R. & Ott, J. Neural network analysis of complex traits. Genet. Epidemiol. 14, 1101–1106 (1997).

    Article  CAS  Google Scholar 

  32. Lucek, P., Hanke, J., Reich, J., Solla, S. A. & Ott, J. Multi-locus nonparametric linkage analysis of complex trait loci with neural networks. Hum. Hered. 48, 275–284 (1998).

    Article  CAS  Google Scholar 

  33. Diaconis, P. & Efron, B. Computer-intensive methods in statistics. Sci. Am. 248, 116–130 (1983).

    Article  Google Scholar 

  34. Zee, R. Y. et al. Multi-locus interactions predict risk for post-PTCA restenosis: an approach to the genetic analysis of common complex disease. Pharmacogenomics J. 2, 197–201 (2002).

    Article  CAS  Google Scholar 

  35. Hoh, J., Wille, A. & Ott, J. Trimming, weighting, and grouping SNPs in human case-control association studies. Genome Res. 11, 2115–2119 (2001).

    Article  CAS  Google Scholar 

  36. Nelson, M. R., Kardia, S. L., Ferrell, R. E. & Sing, C. F. A combinatorial partitioning method to identify multilocus genotypic partitions that predict quantitative trait variation. Genome Res. 11, 458–470 (2001).

    Article  CAS  Google Scholar 

  37. Ritchie, M. D. et al. Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am. J. Hum. Genet. 69, 138–147 (2001).

    Article  CAS  Google Scholar 

  38. Zhang, H., Tsai, C. P., Yu, C. Y. & Bonney, G. Tree-based linkage and association analyses of asthma. Genet. Epidemiol. 21, S317–S322 (2001).

    Article  Google Scholar 

  39. Zhang, H. & Singer, B. Recursive Partitioning in the Health Sciences (Springer, New York, 1999).

    Book  Google Scholar 

  40. Gabriel, S. B. et al. Segregation at three loci explains familial and population risk in Hirschsprung disease. Nature Genet. 31, 89–93 (2002). This paper offers an innovative method that, for the first time, provides complete genetic dissection of a multifactorial disorder.

    Article  CAS  Google Scholar 

  41. Bruning, J. C. et al. Development of a novel polygenic model of NIDDM in mice heterozygous for IR and IRS-1 null alleles. Cell 88, 561–572 (1997).

    Article  CAS  Google Scholar 

  42. Savage, D. B. et al. Digenic inheritance of severe insulin resistance in a human pedigree. Nature Genet. 31, 379–384 (2002).

    Article  CAS  Google Scholar 

  43. Martin, M. P. et al. Epistatic interaction between KIR3DS1 and HLA-B delays the progression to AIDS. Nature Genet. 31, 429–434 (2002).

    Article  CAS  Google Scholar 

  44. Ming, J. E. & Muenke, M. Multiple hits during early embryonic development: digenic diseases and holoprosencephaly. Am. J. Hum. Genet. 71, 1017–1032 (2002).

    Article  CAS  Google Scholar 

  45. Agrawal, R., Imielinski, T. & Swami, A. in Proceedings of ACM SIGMOD Conference on Management of Data (eds Buneman, P. & Jajodia, S.) 207–216 (Association for Computing Machinery, Washington, USA, 1993).

    Google Scholar 

  46. Agrawal, R. & Srikant, R. Fast algorithms for mining association rules. Proceedings of the 20th International Conference on Very Large Databases [online], (cited 1 August 2003), <http://www.almaden.ibm.com/cs/people/ragrawal/papers/vldb94_rj.ps> (1994).

  47. Toivonen, H. T. et al. Data mining applied to linkage disequilibrium mapping. Am. J. Hum. Genet. 67, 133–145 (2000).

    Article  CAS  Google Scholar 

  48. Flodman, P., Macula, A. J., Spence, M. A. & Torney, D. C. Preliminary implementation of new data mining techniques for the analysis of simulation data from Genetic Analysis Workshop 12: Problem 2. Genet. Epidemiol. 21, S390–S395 (2001).

    Article  Google Scholar 

  49. Czika, W. A. et al. Applying data mining techniques to the mapping of complex disease genes. Genet. Epidemiol. 21, S435–S440 (2001).

    Article  Google Scholar 

  50. Crama, Y., Hammer, P. L. & Ibaraki, T. Cause–effect relationships and partially defined Boolean functions. Ann. Oper. Res. 16, 299–326 (1988).

    Article  Google Scholar 

  51. Lauer, M. S. et al. Use of the logical analysis of data method for assessing long-term mortality risk after exercise electrocardiography. Circulation 106, 685–690 (2002).

    Article  Google Scholar 

  52. Frankel, W. N. & Schork, N. J. Who's afraid of epistasis? Nature Genet. 14, 371–373 (1996). In their comments on the two reports in the same issue of the journal, the authors predict that genetic epistasis is a common phenomenon for complex phenotypes despite only sparse evidence at the time.

    Article  CAS  Google Scholar 

  53. Culverhouse, R., Suarez, B. K., Lin, J. & Reich, T. A perspective on epistasis: limits of models displaying no main effect. Am. J. Hum. Genet. 70, 461–471 (2002).

    Article  Google Scholar 

  54. Benjamini, Y., Drai, D., Elmer, G., Kafkafi, N. & Golani, I. Controlling the false discovery rate in behavior genetics research. Behav. Brain Res. 125, 279–284 (2001).

    Article  CAS  Google Scholar 

  55. Reiner, A., Yekutieli, D. & Benjamini, Y. Identifying differentially expressed genes using false discovery rate controlling procedures. Bioinformatics 19, 368–375 (2003).

    Article  CAS  Google Scholar 

  56. Lander, E. & Kruglyak, L. Genetic disseaction of complex traits: guidelines for interpreting and reporting linkage results. Nature Genet. 11, 241–247 (1995). The authors formally address the multiple-testing problem in gene mapping and show how statistical significance can arise by chance alone due to a large number of tests performed. They provide rigorous genome-wide thresholds for testing significance based on the assumption of a dense marker map.

    Article  CAS  Google Scholar 

  57. Efron, B. & Tibshirani, R. J. An Introduction to the Bootstrap (Chapman and Hall, New York, 1998).

    Google Scholar 

Download references

Acknowledgements

This work was supported by grants from the National Institute of Mental Health.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jurg Ott.

Related links

Related links

Databases

LocusLink

APOA

APOC3

CBS

CD14

CETP

IL4Ra

KIR

MDM2

NOS

SERPINE1

TNFR1

TP53

OMIM

Hirschsprung disease

FURTHER INFORMATION

apriori algorithm

Jurg Ott's laboratory

Glossary

RECOMBINATION FRACTION

The proportion of offspring that receives a recombinant haplotype from a parent, or the probability that recombination occurs between two loci.

BACKCROSS

Originally, backcross referred to the mating of an offspring with one of its parents, in which the offspring is heterozygous, with the parent being homozygous for one of the alleles in the offspring's genotype. Nowadays, backcross simply refers to a mating between individuals with those two genotypes.

LIKELIHOOD ANALYSES

A statistical method that calculates the probability of the observed data under varying hypotheses, to estimate model parameters that best explain the observed data and determine the relative strengths of alternative hypotheses.

LOD SCORE

The logarithm of the likelihood ratio (odds) for genetic linkage versus no linkage at a given value of the recombination fraction.

LOGISTIC REGRESSION MODEL

A statistical model for the dependency of a binomial (two-class) phenotype on a number of risk factors. The probability, p, for one of the two phenotype states is expressed in the form of its logit, log(p/(1 – p)), which is assumed to be predicted by the linear combination (weighted sum) of the risk factors.

STEPWISE REGRESSION

The step-by-step build-up of a regression model, which represents a dependent variable as a weighted sum (linear combination) of independent (risk) variables.

TEST STATISTIC

A statistic is any function of a random sample — in particular, of the observations in an experiment. A test statistic is a statistic that is used in a statistical test to discriminate between two competing hypotheses, the so-called null and alternative hypotheses.

SIGNIFICANCE LEVEL

The proportion of false-positive test results out of all false results — that is, results that are obtained when the effect investigated is known to be absent (see also false discovery rate).

ANGIOPLASTY

A medical procedure that is used to widen coronary arteries with a thin balloon because these blood vessels have become clogged.

CORONARY ARTERY RESTENOSIS

The re-occurrence of a narrowing or blockage of an artery at the site where angioplasty had previously been performed.

HARDY–WEINBERG EQUILIBRIUM

A state in which the proportions of genotypes present depends only on the frequencies of alleles in the genotypes.

RESTENOSIS

A re-narrowing or blockage of an artery at a site where angioplasty was previously done.

RECURSIVE PARTITIONING

A process for identifying complex relationships in large sets by dividing them into a hierarchy of smaller and more homogeneous subgroups on the basis of the most statistically significant indicators.

CLUSTER ANALYSIS

A mathematical algorithm that organizes a set of items according to their similarity. For example, genes can be clustered according to their similarity in pattern of expression.

MARGINAL PENETRANCE

In epistatic interactions between two loci asscoiated with disease, each with three genotypes, the nine genotype pairs might each be associated with a certain penetrance — that is, the probability that the genotype pair leads to disease. From these penetrances and the genotype frequencies, (marginal) penetrances might be computed — that is, penetrances that are associated with the genotypes at one of the two loci.

BONFERRONI CORRECTION

When n statistical tests are carried out, each has the potential (probability, p, the significance level) to return a false-positive result. If tests are independent of each other, the so-called experiment-wise probability that one or more tests show a false-positive result is approximately np. So, to achieve an experiment-wise false-positive rate of p, each individual test must only be allowed a false-positive error rate of p/n, which is referred to as the Bonferroni correction.

FALSE DISCOVERY RATE

(FDR). The proportion of false-positive test results out of all positive (significant) tests (note that the FDR is conceptually different to the significance level).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hoh, J., Ott, J. Mathematical multi-locus approaches to localizing complex human trait genes. Nat Rev Genet 4, 701–709 (2003). https://doi.org/10.1038/nrg1155

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg1155

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing