Data mining in healthcare and biomedicine: a survey of the literature

J Med Syst. 2012 Aug;36(4):2431-48. doi: 10.1007/s10916-011-9710-5. Epub 2011 May 3.

Abstract

As a new concept that emerged in the middle of 1990's, data mining can help researchers gain both novel and deep insights and can facilitate unprecedented understanding of large biomedical datasets. Data mining can uncover new biomedical and healthcare knowledge for clinical and administrative decision making as well as generate scientific hypotheses from large experimental data, clinical databases, and/or biomedical literature. This review first introduces data mining in general (e.g., the background, definition, and process of data mining), discusses the major differences between statistics and data mining and then speaks to the uniqueness of data mining in the biomedical and healthcare fields. A brief summarization of various data mining algorithms used for classification, clustering, and association as well as their respective advantages and drawbacks is also presented. Suggested guidelines on how to use data mining algorithms in each area of classification, clustering, and association are offered along with three examples of how data mining has been used in the healthcare industry. Given the successful application of data mining by health related organizations that has helped to predict health insurance fraud and under-diagnosed patients, and identify and classify at-risk people in terms of health with the goal of reducing healthcare cost, we introduce how data mining technologies (in each area of classification, clustering, and association) have been used for a multitude of purposes, including research in the biomedical and healthcare fields. A discussion of the technologies available to enable the prediction of healthcare costs (including length of hospital stay), disease diagnosis and prognosis, and the discovery of hidden biomedical and healthcare patterns from related databases is offered along with a discussion of the use of data mining to discover such relationships as those between health conditions and a disease, relationships among diseases, and relationships among drugs. The article concludes with a discussion of the problems that hamper the clinical use of data mining by health professionals.

Publication types

  • Review

MeSH terms

  • Algorithms
  • Biomedical Research
  • Data Mining* / methods
  • Delivery of Health Care*
  • Health Care Sector