Statistical theory in clustering

Hartigan, J. A.

doi:10.1007/BF01908064

Statistical theory in clustering

Authors Of Articles
Published: December 1985

Volume 2, pages 63–76, (1985)
Cite this article

Journal of Classification Aims and scope Submit manuscript

J. A. Hartigan¹

1735 Accesses
223 Citations
3 Altmetric
Explore all metrics

Abstract

A number of statistical models for forming and evaluating clusters are reviewed. Hierarchical algorithms are evaluated by their ability to discover high density regions in a population, and complete linkage hopelessly fails; the others don't do too well either. Single linkage is at least of mathematical interest because it is related to the minimum spanning tree and percolation. Mixture methods are examined, related to k-means, and the failure of likelihood tests for the number of components is noted. The DIP test for estimating the number of modes in a univariate population measures the distance between the empirical distribution function and the closest unimodal distribution function (or k-modal distribution function when testing for k modes). Its properties are examined and multivariate extensions are proposed. Ultrametric and evolutionary distances on trees are considered briefly.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

BAKER, F.B. (1974), “Stability of Two Hierarchical Grouping Techniques, Case I: Sensitivity to Data Errors,”Journal of the American Statistical Association, 69, 440–445.
Google Scholar
BINDER, D.A. (1978), Comment on ’Estimating Mixtures of Normal Distributions and Switching Regressions’,Journal of the American Statistical Association, 73, 746–747.
Google Scholar
BROADBENT, S.R., and HAMMERSLEY, J.M. (1957), “Percolation Processes, I: Crystals and Mazes,”Proceedings of the Cambridge Philosophical Society, 53, 629–641.
Google Scholar
DAY, N.E. (1969), “Estimating the Components of a Mixture of Normal Distributions,”Biometrika, 56, 463–474.
Google Scholar
DICK, N.P., and BOWDEN, D.C. (1973), “Maximum Likelihood Estimation for Mixture of Two Normal Distributions,”Biometrics, 29, 781–790.
Google Scholar
EVERITT, B.S., and HAND, D.J. (1981),Finite Mixture Distributions, London: Chapman and Hall.
Google Scholar
FITCH, W.M., and MARGOLIASH, E. (1967), “Construction of Phylogenetic Trees,”Science N.Y., 155, 279–284.
Google Scholar
GOWER, J.C., and ROSS, G.J.S. (1969), “Minimum Spanning Trees and Single Linkage Cluster Analysis,”Applied Statistics, 18, 54–65.
Google Scholar
HARTIGAN, J.A. (1967), “Representation of Similarity Matrices by Trees,”Journal of the American Statistical Association, 62, 1140–1158.
Google Scholar
HARTIGAN, J.A. (1975),Clustering Algorithms, New York: John Wiley.
Google Scholar
HARTIGAN, J.A. (1977), “Distribution Problems in Clustering,” inClassification and Clustering, ed. J. V. Ryzin, New York: Academic Press.
Google Scholar
HARTIGAN, J.A. (1978), “Asymptotic Distributions for Clustering Criteria,”The Annals of Statistics, 6, 117–131.
Google Scholar
HARTIGAN, J.A. (1981), “Consistency of Single Linkage for High Density Clusters,”Journal of the American Statistical Association, 76, 388–394.
Google Scholar
HARTIGAN, J.A., and HARTIGAN, P.M. (1984), “The Dip Test of Multimodality,”The Annals of Statistics, submitted.
HOSMER, D.W. (1973), “A Comparison of Iterative Maximum Likelihood Estimates of the Parameters of a Mixture of Two Normal Distributions under Three Different Types of Sample,”Biometrics, 29, 761–770.
Google Scholar
JARDINE, C.J., JARDINE, N., and SIBSON, R. (1967), “The Structure and Construction of Taxonomic Hierarchies,”Math. Biosciences, 1, 173–179.
Google Scholar
JOHNSON, S.C. (1967), “Hierarchical Clustering Schemes,”Psychometrika, 32, 241–254.
PubMed Google Scholar
LING, R.F. (1973), “A Probability Theory of Cluster Analysis,”Journal of the American Statistical Association, 68, 159–169.
Google Scholar
MAC QUEEN, J. (1967), “Some Methods for Classification and Analysis of Multivariate Observations,”Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1, 281–297.
Google Scholar
POLLARD, D. (1982), “A Central Limit Theorem for k-means Clustering,”Annals of Probability, 10, 919–926.
Google Scholar
RAO, C.R. (1948), “The Utilization of Multiple Measurements in Problems of Biological Classification,”Journal of the Royal Statistical Society, Series B, 10, 159–203.
Google Scholar
SMYTHE, R.T., and WIERMAN, J.C. (1978), “First Passage Percolation on the Square Lattice,”Leture Notes in Mathematics, 671, Berlin: Springer-Verlag.
Google Scholar
WISHART, D. (1969), “Mode Analysis: A Generalization of Nearest Neighbor Which Reduces Chaining Effects,” inNumerical Taxonomy, ed. A. J. Cole, London: Academic Press.
Google Scholar
WOLFE, J.H. (1970), “Pattern Clustering by Multivariate Analysis,”Multivariate Behavioral Research, 5, 329–350.
Google Scholar
WOLFE, J.H. (1971), “A Monte-Carlo Study of the Sampling Distribution of the Likelihood Ratio fro Mixtures of Multinormal Distributions,”Research Memorandum, 72–2, Naval Personnel and Research Training Laboratory, San Diego.
Google Scholar
WONG, M.A. (1982), “A Hybrid Clustering Algorithm for Identifying High Density Clusters,”Journal of the American Statistical Association, 77, 841–847.
Google Scholar
WONG, M.A., and LANE, T. (1983), “A kth Nearest Neighbor Clustering Procedure,”Journal of the Royal Statistical Society, SeriesB, 45, 362–368.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, Yale University, Yale Station, Box 2179, 06520, New Haven, Connecticut, USA
J. A. Hartigan

Authors

J. A. Hartigan
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Research supported by the National Science Foundation Grant No. MCS-8102280.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hartigan, J.A. Statistical theory in clustering. Journal of Classification 2, 63–76 (1985). https://doi.org/10.1007/BF01908064

Download citation

Issue Date: December 1985
DOI: https://doi.org/10.1007/BF01908064

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Statistical theory in clustering

Abstract

Access this article

Similar content being viewed by others

Confidence distributions and hypothesis testing

Density-Based Clustering Based on Hierarchical Density Estimates

A Comprehensive Survey of Anomaly Detection Algorithms

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Statistical theory in clustering

Abstract

Access this article

Similar content being viewed by others

Confidence distributions and hypothesis testing

Density-Based Clustering Based on Hierarchical Density Estimates

A Comprehensive Survey of Anomaly Detection Algorithms

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation