Original ArticleUsing multiple data features improved the validity of osteoporosis case ascertainment from administrative databases
Introduction
Osteoporosis is a disease characterized by decreased bone mass and increased fracture risk. It represents a significant population health issue because of its known negative effects on quality of life and other health outcomes, and also because of its increasing prevalence due to an aging population [1], [2], [3], [4]. The operational definition of osteoporosis developed by the World Health Organization (WHO) is based on the measurement of bone mineral density (BMD), which is not easily captured on a population-wide basis [5]. Cohort studies involving primary data collection have been used to estimate osteoporosis prevalence [6], [7], but such studies are expensive and time-consuming to conduct. Population-based administrative databases are increasingly being explored for their potential to provide epidemiological information about osteoporosis [3], [4], [8], [9], [10], [11]. The advantages of using administrative data include: (a) it is relatively inexpensive to establish and maintain a population-based surveillance system using these databases, (b) longitudinal studies can be conducted using data linkage techniques, and (c) disease cases and noncases can be compared on comorbid conditions.
Osteoporosis and fracture diagnosis codes have been used in previous studies to ascertain disease cases in administrative databases. Administrative data have been used to construct case ascertainment algorithms for a number of other chronic diseases including diabetes, asthma, and inflammatory bowel disease [12], [13], [14], [15], [16], [17], [18], [19]. However, osteoporosis is known to be underdiagnosed in administrative databases [8], [9]. When diagnosis codes are the sole source of case ascertainment, this results in an underestimation of disease prevalence. Underdiagnosis is not unique to osteoporosis; incomplete capture of diagnoses in administrative data has been documented for other chronic diseases, such as hypertension [20].
One solution is to develop a case ascertainment algorithm that does not rely exclusively on diagnosis codes. In many jurisdictions, population-based administrative data repositories are expanding in size and scope. As a result, there are multiple data features that could potentially be used for case ascertainment. For example, osteoporosis drug treatments have been proposed for case ascertainment in pharmacy databases [10], [21].
Statistical and machine-learning classification models have been used in fields such as psychology, marketing, and engineering, to identify the data features that distinguish among population subgroups and to predict the probability of group membership [22], [23], [24], [25], [26], [27]. Classification is a process whereby an algorithm, or rule, that assigns observations to groups is developed from multiple data features in a training data set. Once trained, the classifier can take new, unseen observations and predict their probability of group membership.
The purpose of this study is to compare the validity of algorithms for osteoporosis case ascertainment defined from various features of hospital, physician, and pharmacy administrative data. The algorithms are constructed using classification models, and validity is assessed using data from a regional BMD testing program. The algorithms are applied to administrative data to estimate the population prevalence of osteoporosis.
Section snippets
Data sources
Population-based hospital, physician, and pharmacy administrative data as well as BMD testing data were from Manitoba, a centrally located province in Canada with a population of 1.2 million [28] and a system of universal health care. Data were from the Manitoba Centre for Health Policy Repository. Ethics approval was received from the University of Manitoba Health Research Ethics Board, and approval for data access was granted by the Manitoba Health Information Privacy Committee.
A hospital
Results
A total of 1,277 (31.8%) females 50+ years of age were defined as osteoporosis cases in the training cohort based on the minimum BMD T-score for the hip or spine. All of these individuals had a single BMD scan in 2000–2001. Almost all were new cases; only 5.3% had a positive scan in the 5 years prior to cohort definition. Moreover, virtually all (99.4%) of the noncases remained as noncases for a 3-year period following their negative index scan.
The demographic characteristics of cases and
Discussion
This study constructed algorithms to ascertain osteoporosis cases in administrative data and validated these algorithms using test results from a regional bone density testing program. Discriminative performance was lower and prediction error was poorer when the classification models included only osteoporosis and fracture diagnosis variables than when prescriptions for osteoporosis treatment were also included. Age was identified as another variable that contributed to improved discriminative
Acknowledgments
This research was supported by a grant from the Canadian Institutes of Health Research to the first and sixth authors, and by a Canadian Institutes of Health Research New Investigator Award to the first author. The authors are indebted to Manitoba Health for the provision of data. The results and conclusions are those of the authors, and no official endorsement by Manitoba Health is intended or should be inferred.
References (53)
- et al.
Costs and health effects of osteoporotic fractures
Bone
(1994) - et al.
Report on the Japanese Orthopaedic Association's 3-year project observing hip fractures at fixed-point hospitals
J Orthop Sci
(2006) - et al.
Methodology for estimating current and future burden of osteoporosis in state populations: application to Florida in 2000 through 2025
Value Health
(2003) - et al.
Estimated prevalence of osteoporosis from a Nationwide Health Insurance database in Taiwan
Health Policy
(2006) - et al.
Review of the performance of methods to identify diabetes cases among vital statistics, administrative, and survey data
Ann Epidemiol
(2004) - et al.
Validation of diagnostic codes within medical services claims
J Clin Epidemiol
(2004) - et al.
Feature selection and machine learning with mass spectrometry data for distinguishing cancer and non-cancer samples
Stat Methodol
(2006) - et al.
Artificial neural network models for prediction of acute coronary syndromes using clinical data from the time of presentation
Ann Emerg Med
(2005) - et al.
Construction and validation of a population-based bone densitometry database
J Clin Densitom
(2005) - et al.
Application of the 1994 WHO classification to populations other than postmenopausal Caucasian women: the 2005 ISCD Official Positions
J Clin Densitom
(2006)
Assessing ecologic proxies for household income: a comparison of household and neighbourhood level income measures in the study of population health status
Health Place
Using a neural network to screen a population for asthma
Ann Epidemiol
External validity of predictive models: a comparison of logistic regression, classification trees, and neural networks
J Clin Epidemiol
Assessing predictive accuracy: how to compare Brier scores
J Clin Epidemiol
Age- and gender-specific rate of fractures in Australia: a population-based study
Osteoporos Int
Patterns of use of the bone mineral density test in Ontario, 1992-1998
CMAJ
Estimation of the prevalence of low bone density in Canadian women and men using a population-specific DXA reference standard: the Canadian Multicentre Osteoporosis Study (CaMos)
Osteoporos Int
Classification of osteoporosis based on bone mineral densities
J Bone Miner.Res
Osteoporosis is markedly underdiagnosed: a nationwide study from Denmark
Osteoporos Int
Developments of the incidence of osteoporosis in The Netherlands: a PHARMO study
Pharmacoepidemiol Drug Saf
Epidemiology and direct medical costs of osteoporotic fractures in men and women in Switzerland
Osteoporos Int
Epidemiology of Crohn's disease and ulcerative colitis in a central Canadian province: a population-based study
Am J Epidemiol
Incidence and prevalence of diabetes in Manitoba, 1986–1991
Diabetes Care
Validation of claims diagnoses and self-reported conditions compared with medical records for selected chronic diseases
J Ambul Care Manage
Diabetes in Ontario: determination of prevalence and incidence using a validated administrative data algorithm
Diabetes Care
Specificity and sensitivity of claims-based algorithms for identifying members of Medicare+Choice health plans that have chronic medical conditions
Health Serv Res
Cited by (62)
Prevalence, incidence and risk factors of diabetes in Australian adults aged ≥45 years: A cohort study using linked routinely-collected data
2020, Journal of Clinical and Translational EndocrinologyCitation Excerpt :In addition, the use of multiple data sources, including laboratory results and glucose-lowering drug prescription claims, to ascertain diabetes status reduces the chances of cases being missed. Indeed, it has been shown that such an approach improves the sensitivity of disease ascertainment without compromising on positive predictive value [47]. Our breakdown of the contribution of the different linked data sources in the identification of diabetes highlights the number of cases that would have been missed through using only self-report.
Postdischarge prophylactic antibiotics following mastectomy with and without breast reconstruction
2022, Infection Control and Hospital EpidemiologyEstimating the completeness of physician billing claims for diabetes case ascertainment: a multiprovince investigation
2023, Health Promotion and Chronic Disease Prevention in CanadaA population‐based study to develop juvenile arthritis case definitions for administrative health data using model‐based dynamic classification
2021, BMC Medical Research Methodology