Using multiple data features improved the validity of osteoporosis case ascertainment from administrative databases

doi:10.1016/j.jclinepi.2008.02.002

Journal of Clinical Epidemiology

Volume 61, Issue 12, December 2008, Pages 1250-1260

https://doi.org/10.1016/j.jclinepi.2008.02.002 Get rights and content

Abstract

Objectives

The aim was to construct and validate algorithms for osteoporosis case ascertainment from administrative databases and to estimate the population prevalence of osteoporosis for these algorithms.

Study Design and Setting

Artificial neural networks, classification trees, and logistic regression were applied to hospital, physician, and pharmacy data from Manitoba, Canada. Discriminative performance and calibration (i.e., error) were compared for algorithms defined from different sets of diagnosis, prescription drug, comorbidity, and demographic variables. Algorithms were validated against a regional bone mineral density testing program.

Results

Discriminative performance and calibration were poorer and sensitivity was generally lower for algorithms based on diagnosis codes alone than for algorithms based on an expanded set of data features that included osteoporosis prescriptions and age. Validation measures were similar for neural networks and classification trees, but prevalence estimates were lower for the former model.

Conclusion

Multiple features of administrative data generally resulted in improved sensitivity of osteoporosis case-detection algorithm without loss of specificity. However, prevalence estimates using an expanded set of features were still slightly lower than estimates from a population-based study with primary data collection. The classification methods developed in this study can be extended to other chronic diseases for which there may be multiple markers in administrative data.

Introduction

Osteoporosis is a disease characterized by decreased bone mass and increased fracture risk. It represents a significant population health issue because of its known negative effects on quality of life and other health outcomes, and also because of its increasing prevalence due to an aging population [1], [2], [3], [4]. The operational definition of osteoporosis developed by the World Health Organization (WHO) is based on the measurement of bone mineral density (BMD), which is not easily captured on a population-wide basis [5]. Cohort studies involving primary data collection have been used to estimate osteoporosis prevalence [6], [7], but such studies are expensive and time-consuming to conduct. Population-based administrative databases are increasingly being explored for their potential to provide epidemiological information about osteoporosis [3], [4], [8], [9], [10], [11]. The advantages of using administrative data include: (a) it is relatively inexpensive to establish and maintain a population-based surveillance system using these databases, (b) longitudinal studies can be conducted using data linkage techniques, and (c) disease cases and noncases can be compared on comorbid conditions.

Osteoporosis and fracture diagnosis codes have been used in previous studies to ascertain disease cases in administrative databases. Administrative data have been used to construct case ascertainment algorithms for a number of other chronic diseases including diabetes, asthma, and inflammatory bowel disease [12], [13], [14], [15], [16], [17], [18], [19]. However, osteoporosis is known to be underdiagnosed in administrative databases [8], [9]. When diagnosis codes are the sole source of case ascertainment, this results in an underestimation of disease prevalence. Underdiagnosis is not unique to osteoporosis; incomplete capture of diagnoses in administrative data has been documented for other chronic diseases, such as hypertension [20].

One solution is to develop a case ascertainment algorithm that does not rely exclusively on diagnosis codes. In many jurisdictions, population-based administrative data repositories are expanding in size and scope. As a result, there are multiple data features that could potentially be used for case ascertainment. For example, osteoporosis drug treatments have been proposed for case ascertainment in pharmacy databases [10], [21].

Statistical and machine-learning classification models have been used in fields such as psychology, marketing, and engineering, to identify the data features that distinguish among population subgroups and to predict the probability of group membership [22], [23], [24], [25], [26], [27]. Classification is a process whereby an algorithm, or rule, that assigns observations to groups is developed from multiple data features in a training data set. Once trained, the classifier can take new, unseen observations and predict their probability of group membership.

The purpose of this study is to compare the validity of algorithms for osteoporosis case ascertainment defined from various features of hospital, physician, and pharmacy administrative data. The algorithms are constructed using classification models, and validity is assessed using data from a regional BMD testing program. The algorithms are applied to administrative data to estimate the population prevalence of osteoporosis.

Section snippets

Data sources

Population-based hospital, physician, and pharmacy administrative data as well as BMD testing data were from Manitoba, a centrally located province in Canada with a population of 1.2 million [28] and a system of universal health care. Data were from the Manitoba Centre for Health Policy Repository. Ethics approval was received from the University of Manitoba Health Research Ethics Board, and approval for data access was granted by the Manitoba Health Information Privacy Committee.

A hospital

Results

A total of 1,277 (31.8%) females 50+ years of age were defined as osteoporosis cases in the training cohort based on the minimum BMD T-score for the hip or spine. All of these individuals had a single BMD scan in 2000–2001. Almost all were new cases; only 5.3% had a positive scan in the 5 years prior to cohort definition. Moreover, virtually all (99.4%) of the noncases remained as noncases for a 3-year period following their negative index scan.

The demographic characteristics of cases and

Discussion

This study constructed algorithms to ascertain osteoporosis cases in administrative data and validated these algorithms using test results from a regional bone density testing program. Discriminative performance was lower and prediction error was poorer when the classification models included only osteoporosis and fracture diagnosis variables than when prescriptions for osteoporosis treatment were also included. Age was identified as another variable that contributed to improved discriminative

Acknowledgments

This research was supported by a grant from the Canadian Institutes of Health Research to the first and sixth authors, and by a Canadian Institutes of Health Research New Investigator Award to the first author. The authors are indebted to Manitoba Health for the provision of data. The results and conclusions are those of the authors, and no official endorsement by Manitoba Health is intended or should be inferred.

References (53)

E. Chrischilles et al.
Costs and health effects of osteoporotic fractures
Bone
(1994)
K. Sakamoto et al.
Report on the Japanese Orthopaedic Association's 3-year project observing hip fractures at fixed-point hospitals
J Orthop Sci
(2006)
R.T. Burge et al.
Methodology for estimating current and future burden of osteoporosis in state populations: application to Florida in 2000 through 2025
Value Health
(2003)
N.P. Yang et al.
Estimated prevalence of osteoporosis from a Nationwide Health Insurance database in Taiwan
Health Policy
(2006)
S.H. Saydah et al.
Review of the performance of methods to identify diabetes cases among vital statistics, administrative, and survey data
Ann Epidemiol
(2004)
M. Wilchesky et al.
Validation of diagnostic codes within medical services claims
J Clin Epidemiol
(2004)
S. Datta et al.
Feature selection and machine learning with mass spectrometry data for distinguishing cancer and non-cancer samples
Stat Methodol
(2006)
R.F. Harrison et al.
Artificial neural network models for prediction of acute coronary syndromes using clinical data from the time of presentation
Ann Emerg Med
(2005)
W.D. Leslie et al.
Construction and validation of a population-based bone densitometry database
J Clin Densitom
(2005)
W.D. Leslie et al.
Application of the 1994 WHO classification to populations other than postmenopausal Caucasian women: the 2005 ISCD Official Positions
J Clin Densitom
(2006)

C.A. Mustard et al.

Assessing ecologic proxies for household income: a comparison of household and neighbourhood level income measures in the study of population health status

Health Place

(1999)

S. Hirsch et al.

Using a neural network to screen a population for asthma

Ann Epidemiol

(2001)

N. Terrin et al.

External validity of predictive models: a comparison of logistic regression, classification trees, and neural networks

J Clin Epidemiol

(2003)

D.A. Redelmeier et al.

Assessing predictive accuracy: how to compare Brier scores

J Clin Epidemiol

(1991)

K.M. Sanders et al.

Age- and gender-specific rate of fractures in Australia: a population-based study

Osteoporos Int

(1999)

S.B. Jaglal et al.

Patterns of use of the bone mineral density test in Ontario, 1992-1998

CMAJ

(2000)

A. Tenenhouse et al.

Estimation of the prevalence of low bone density in Canadian women and men using a population-specific DXA reference standard: the Canadian Multicentre Osteoporosis Study (CaMos)

Osteoporos Int

(2000)

Y. Lu et al.

Classification of osteoporosis based on bone mineral densities

J Bone Miner.Res

(2001)

P. Vestergaard et al.

Osteoporosis is markedly underdiagnosed: a nationwide study from Denmark

Osteoporos Int

(2005)

W.G. Goettsch et al.

Developments of the incidence of osteoporosis in The Netherlands: a PHARMO study

Pharmacoepidemiol Drug Saf

(2007)

K. Lippuner et al.

Epidemiology and direct medical costs of osteoporotic fractures in men and women in Switzerland

Osteoporos Int

(2005)

C.N. Bernstein et al.

Epidemiology of Crohn's disease and ulcerative colitis in a central Canadian province: a population-based study

Am J Epidemiol

(1999)

J.F. Blanchard et al.

Incidence and prevalence of diabetes in Manitoba, 1986–1991

Diabetes Care

(1996)

J.B. Fowles et al.

Validation of claims diagnoses and self-reported conditions compared with medical records for selected chronic diseases

J Ambul Care Manage

(1998)

J.E. Hux et al.

Diabetes in Ontario: determination of prevalence and incidence using a validated administrative data algorithm

Diabetes Care

(2002)

T.S. Rector et al.

Specificity and sensitivity of claims-based algorithms for identifying members of Medicare+Choice health plans that have chronic medical conditions

Health Serv Res

(2004)

Cited by (62)

Prevalence, incidence and risk factors of diabetes in Australian adults aged ≥45 years: A cohort study using linked routinely-collected data
2020, Journal of Clinical and Translational Endocrinology
Citation Excerpt :
In addition, the use of multiple data sources, including laboratory results and glucose-lowering drug prescription claims, to ascertain diabetes status reduces the chances of cases being missed. Indeed, it has been shown that such an approach improves the sensitivity of disease ascertainment without compromising on positive predictive value [47]. Our breakdown of the contribution of the different linked data sources in the identification of diabetes highlights the number of cases that would have been missed through using only self-report.
To use linked routinely-collected health data to estimate diabetes prevalence and incidence in an Australian cohort of adults aged ≥45 years, and examine risk factors associated with incident disease.
The EXamining ouTcomEs in chroNic Disease in the 45 and Up Study (EXTEND45) Study is a linked data study that combines baseline questionnaire responses from the population-based 45 and Up Study (2006–2009, n = 267,153) with multiple routinely-collected health databases up to December 2014. Among participants with ≥1 linked result for any laboratory test, diabetes status was determined from multiple data sources according to standard biochemical criteria, use of glucose-lowering medication or self-report, and the prevalence and incidence rate calculated. Independent risk factors of incident diabetes were examined using multivariable Cox regression.
Among 152,169 45 and Up Study participants with ≥1 linked laboratory result in the EXTEND45 database (mean age 63.0 years; 54.9% female), diabetes prevalence was 10.8% (95% confidence interval [CI] 10.6%–10.9%). Incident disease in those without diabetes at baseline (n = 135,810; mean age 62.5 years; 56.1% female) was 10.0 per 1,000 person-years (95% CI 9.8–10.2). In all age groups, diabetes incidence was lower in women compared to men, an association that persisted in the fully adjusted analyses. Other independent risk factors of diabetes were older age, being born outside of Australia (with the highest rate of 19.2 per 1,000 person-years observed in people born in South and Central Asia), lower education status, lower annual household income, residence in a major city, family history of diabetes, personal history of cardiovascular disease or hypertension, higher body mass index, smoking and long sleeping hours.
Our study represents an efficient approach to assessing diabetes frequency and its risk factors in the community. The infrastructure provided by the EXTEND45 Study will be useful for diabetes surveillance and examining other important clinical and epidemiological questions.
Postdischarge prophylactic antibiotics following mastectomy with and without breast reconstruction
2022, Infection Control and Hospital Epidemiology
Estimating the completeness of physician billing claims for diabetes case ascertainment: a multiprovince investigation
2023, Health Promotion and Chronic Disease Prevention in Canada
Fracture definitions in observational osteoporosis drug effects studies that leverage healthcare administrative (claims) data: a scoping review
2022, Osteoporosis International
A population‐based study to develop juvenile arthritis case definitions for administrative health data using model‐based dynamic classification
2021, BMC Medical Research Methodology
Increased chronic disease prevalence among the younger generation: Findings from a population-based data linkage study to inform chronic disease ascertainment among reproductive-aged Australian women
2021, PLoS ONE

View all citing articles on Scopus

View full text

Original ArticleUsing multiple data features improved the validity of osteoporosis case ascertainment from administrative databases

Abstract

Objectives

Study Design and Setting

Results

Conclusion

Introduction

Section snippets

Data sources

Results

Discussion

Acknowledgments

Bone

J Orthop Sci

Value Health

Health Policy

Ann Epidemiol

J Clin Epidemiol

Stat Methodol

Ann Emerg Med

J Clin Densitom

J Clin Densitom

Health Place

Ann Epidemiol

J Clin Epidemiol

J Clin Epidemiol

Age- and gender-specific rate of fractures in Australia: a population-based study

Osteoporos Int

Patterns of use of the bone mineral density test in Ontario, 1992-1998

CMAJ

Estimation of the prevalence of low bone density in Canadian women and men using a population-specific DXA reference standard: the Canadian Multicentre Osteoporosis Study (CaMos)

Osteoporos Int

Classification of osteoporosis based on bone mineral densities

J Bone Miner.Res

Osteoporosis is markedly underdiagnosed: a nationwide study from Denmark

Osteoporos Int

Developments of the incidence of osteoporosis in The Netherlands: a PHARMO study

Pharmacoepidemiol Drug Saf

Epidemiology and direct medical costs of osteoporotic fractures in men and women in Switzerland

Osteoporos Int

Epidemiology of Crohn's disease and ulcerative colitis in a central Canadian province: a population-based study

Am J Epidemiol

Incidence and prevalence of diabetes in Manitoba, 1986–1991

Diabetes Care

Validation of claims diagnoses and self-reported conditions compared with medical records for selected chronic diseases

J Ambul Care Manage

Diabetes in Ontario: determination of prevalence and incidence using a validated administrative data algorithm

Diabetes Care

Specificity and sensitivity of claims-based algorithms for identifying members of Medicare+Choice health plans that have chronic medical conditions

Health Serv Res

Original Article
Using multiple data features improved the validity of osteoporosis case ascertainment from administrative databases