Original Article
Using multiple data features improved the validity of osteoporosis case ascertainment from administrative databases

https://doi.org/10.1016/j.jclinepi.2008.02.002Get rights and content

Abstract

Objectives

The aim was to construct and validate algorithms for osteoporosis case ascertainment from administrative databases and to estimate the population prevalence of osteoporosis for these algorithms.

Study Design and Setting

Artificial neural networks, classification trees, and logistic regression were applied to hospital, physician, and pharmacy data from Manitoba, Canada. Discriminative performance and calibration (i.e., error) were compared for algorithms defined from different sets of diagnosis, prescription drug, comorbidity, and demographic variables. Algorithms were validated against a regional bone mineral density testing program.

Results

Discriminative performance and calibration were poorer and sensitivity was generally lower for algorithms based on diagnosis codes alone than for algorithms based on an expanded set of data features that included osteoporosis prescriptions and age. Validation measures were similar for neural networks and classification trees, but prevalence estimates were lower for the former model.

Conclusion

Multiple features of administrative data generally resulted in improved sensitivity of osteoporosis case-detection algorithm without loss of specificity. However, prevalence estimates using an expanded set of features were still slightly lower than estimates from a population-based study with primary data collection. The classification methods developed in this study can be extended to other chronic diseases for which there may be multiple markers in administrative data.

Introduction

Osteoporosis is a disease characterized by decreased bone mass and increased fracture risk. It represents a significant population health issue because of its known negative effects on quality of life and other health outcomes, and also because of its increasing prevalence due to an aging population [1], [2], [3], [4]. The operational definition of osteoporosis developed by the World Health Organization (WHO) is based on the measurement of bone mineral density (BMD), which is not easily captured on a population-wide basis [5]. Cohort studies involving primary data collection have been used to estimate osteoporosis prevalence [6], [7], but such studies are expensive and time-consuming to conduct. Population-based administrative databases are increasingly being explored for their potential to provide epidemiological information about osteoporosis [3], [4], [8], [9], [10], [11]. The advantages of using administrative data include: (a) it is relatively inexpensive to establish and maintain a population-based surveillance system using these databases, (b) longitudinal studies can be conducted using data linkage techniques, and (c) disease cases and noncases can be compared on comorbid conditions.

Osteoporosis and fracture diagnosis codes have been used in previous studies to ascertain disease cases in administrative databases. Administrative data have been used to construct case ascertainment algorithms for a number of other chronic diseases including diabetes, asthma, and inflammatory bowel disease [12], [13], [14], [15], [16], [17], [18], [19]. However, osteoporosis is known to be underdiagnosed in administrative databases [8], [9]. When diagnosis codes are the sole source of case ascertainment, this results in an underestimation of disease prevalence. Underdiagnosis is not unique to osteoporosis; incomplete capture of diagnoses in administrative data has been documented for other chronic diseases, such as hypertension [20].

One solution is to develop a case ascertainment algorithm that does not rely exclusively on diagnosis codes. In many jurisdictions, population-based administrative data repositories are expanding in size and scope. As a result, there are multiple data features that could potentially be used for case ascertainment. For example, osteoporosis drug treatments have been proposed for case ascertainment in pharmacy databases [10], [21].

Statistical and machine-learning classification models have been used in fields such as psychology, marketing, and engineering, to identify the data features that distinguish among population subgroups and to predict the probability of group membership [22], [23], [24], [25], [26], [27]. Classification is a process whereby an algorithm, or rule, that assigns observations to groups is developed from multiple data features in a training data set. Once trained, the classifier can take new, unseen observations and predict their probability of group membership.

The purpose of this study is to compare the validity of algorithms for osteoporosis case ascertainment defined from various features of hospital, physician, and pharmacy administrative data. The algorithms are constructed using classification models, and validity is assessed using data from a regional BMD testing program. The algorithms are applied to administrative data to estimate the population prevalence of osteoporosis.

Section snippets

Data sources

Population-based hospital, physician, and pharmacy administrative data as well as BMD testing data were from Manitoba, a centrally located province in Canada with a population of 1.2 million [28] and a system of universal health care. Data were from the Manitoba Centre for Health Policy Repository. Ethics approval was received from the University of Manitoba Health Research Ethics Board, and approval for data access was granted by the Manitoba Health Information Privacy Committee.

A hospital

Results

A total of 1,277 (31.8%) females 50+ years of age were defined as osteoporosis cases in the training cohort based on the minimum BMD T-score for the hip or spine. All of these individuals had a single BMD scan in 2000–2001. Almost all were new cases; only 5.3% had a positive scan in the 5 years prior to cohort definition. Moreover, virtually all (99.4%) of the noncases remained as noncases for a 3-year period following their negative index scan.

The demographic characteristics of cases and

Discussion

This study constructed algorithms to ascertain osteoporosis cases in administrative data and validated these algorithms using test results from a regional bone density testing program. Discriminative performance was lower and prediction error was poorer when the classification models included only osteoporosis and fracture diagnosis variables than when prescriptions for osteoporosis treatment were also included. Age was identified as another variable that contributed to improved discriminative

Acknowledgments

This research was supported by a grant from the Canadian Institutes of Health Research to the first and sixth authors, and by a Canadian Institutes of Health Research New Investigator Award to the first author. The authors are indebted to Manitoba Health for the provision of data. The results and conclusions are those of the authors, and no official endorsement by Manitoba Health is intended or should be inferred.

References (53)

  • C.A. Mustard et al.

    Assessing ecologic proxies for household income: a comparison of household and neighbourhood level income measures in the study of population health status

    Health Place

    (1999)
  • S. Hirsch et al.

    Using a neural network to screen a population for asthma

    Ann Epidemiol

    (2001)
  • N. Terrin et al.

    External validity of predictive models: a comparison of logistic regression, classification trees, and neural networks

    J Clin Epidemiol

    (2003)
  • D.A. Redelmeier et al.

    Assessing predictive accuracy: how to compare Brier scores

    J Clin Epidemiol

    (1991)
  • K.M. Sanders et al.

    Age- and gender-specific rate of fractures in Australia: a population-based study

    Osteoporos Int

    (1999)
  • S.B. Jaglal et al.

    Patterns of use of the bone mineral density test in Ontario, 1992-1998

    CMAJ

    (2000)
  • A. Tenenhouse et al.

    Estimation of the prevalence of low bone density in Canadian women and men using a population-specific DXA reference standard: the Canadian Multicentre Osteoporosis Study (CaMos)

    Osteoporos Int

    (2000)
  • Y. Lu et al.

    Classification of osteoporosis based on bone mineral densities

    J Bone Miner.Res

    (2001)
  • P. Vestergaard et al.

    Osteoporosis is markedly underdiagnosed: a nationwide study from Denmark

    Osteoporos Int

    (2005)
  • W.G. Goettsch et al.

    Developments of the incidence of osteoporosis in The Netherlands: a PHARMO study

    Pharmacoepidemiol Drug Saf

    (2007)
  • K. Lippuner et al.

    Epidemiology and direct medical costs of osteoporotic fractures in men and women in Switzerland

    Osteoporos Int

    (2005)
  • C.N. Bernstein et al.

    Epidemiology of Crohn's disease and ulcerative colitis in a central Canadian province: a population-based study

    Am J Epidemiol

    (1999)
  • J.F. Blanchard et al.

    Incidence and prevalence of diabetes in Manitoba, 1986–1991

    Diabetes Care

    (1996)
  • J.B. Fowles et al.

    Validation of claims diagnoses and self-reported conditions compared with medical records for selected chronic diseases

    J Ambul Care Manage

    (1998)
  • J.E. Hux et al.

    Diabetes in Ontario: determination of prevalence and incidence using a validated administrative data algorithm

    Diabetes Care

    (2002)
  • T.S. Rector et al.

    Specificity and sensitivity of claims-based algorithms for identifying members of Medicare+Choice health plans that have chronic medical conditions

    Health Serv Res

    (2004)
  • Cited by (62)

    View all citing articles on Scopus
    View full text