Disaster and terrorismSyndromic surveillance: the effects of syndrome grouping on model accuracy and outbreak detection☆,☆☆
Introduction
Syndromic surveillance systems are being developed and deployed at the local, state, and national levels in response to the threat of bioterrorism.1 Syndromic surveillance2 is an approach to public health monitoring that relies on detecting clinical case features that are discernable before confirmed diagnoses are made.3 In particular, before the laboratory confirmation of an infectious disease, infected individuals may exhibit behavioral patterns, symptoms, signs, or laboratory findings that can be detected through a variety of data sources.
There are 4 basic methodologic stages in processing emergency department (ED) syndromic data for outbreak detection. First, in the syndromic grouping stage, data are gathered from the sources that feed into the system and are organized according to a coding scheme that allows each patient to be assigned to a particular syndromic group. Next is the modeling stage, in which historical data, usually reaching back from 1 to several years, are analyzed to establish a model of the normal temporal pattern. Third, in the detection stage, the expected values from the model (eg, daily frequencies of patients presenting in each syndromic group) are compared against observed values collected in the field to determine whether abnormal activity is occurring. Finally, in the alarms stage, thresholds are set for evaluation of whether the unusual patterns warrant notification. Degradation of the information quality anywhere along this 4-stage process might dramatically hamper outbreak detection performance.
Previous studies have shown that chief complaints and International Classification of Diseases, Ninth Revision (ICD-9)4 diagnostic codes can be used to classify ED encounters into syndromes5., 6., 7., 8., 9. and are valuable at the syndromic grouping stage. Chief complaints are usually recorded during triage, and all US EDs use the ICD-9 coding for billing. Real-time data feeds from these encounters have been successfully established in a number of cities.1., 10., 11., 12.
The goal of this study was to measure the impact of 3 syndromic grouping methods on the modeling and detection stages. We compare forecasting accuracy at the modeling stage and detection sensitivity at the detection stage of 3 groupings according to chief complaints, ICD-9 codes, and an inclusive combination of chief complaints and ICD-9 codes.
Section snippets
Setting
Data were extracted from the information systems of 2 major urban, academic, tertiary care hospitals sharing the same catchment area, each having an annual census of approximately 50,000 ED visits. Hospital 1 is a general hospital, and hospital 2 is a pediatric hospital. Eligible participants were all patients treated in the EDs of hospitals 1 or 2 between June 1, 1998, and January 5, 2003. This period included 1,680 consecutive days at each hospital. During this period, each hospital census
Results
Table 1 shows the average total number of daily visits and the SD for each of the 6 time series throughout the entire data sets. For both hospitals, fewer visits were considered respiratory related when only the chief complaints were examined; more visits were included if only the diagnostic codes were examined. As would be expected, the most visits were included when the chief complaint and the diagnostic code were considered together.
To examine these daily visit numbers in greater detail, a
Limitations
Because there is a paucity of data available on actual biologic warfare attacks, we relied, as have others,17 on simulated attacks for model validation. The characteristics of a true biologic attack, as reflected in ED syndromic surveillance data, would vary, depending on a wide range of characteristics. The strength of the approach we have taken lies in the use of semisynthetic data for simulation. Although the outbreak signal is simulated, the background noise is generated with authentic ED
Discussion
This study examines the relationship between methods of syndromic grouping and outbreak detection performance. We demonstrate that dramatically better surveillance can be achieved when the proper syndromic grouping methods are applied to certain types of data. The ordering of the model accuracies (BOTH>DX>CC) is directly consistent with the ordering of detection sensitivities, which suggests that the noise effects of input data coding propagate through a syndromic surveillance system, having
Acknowledgements
We thank Karen Olson, PhD, Shlomit Feit, and Allison Beitel for their methodologic contributions, which enabled generation of the data sets used in this study.
References (18)
- et al.
Implementing syndromic surveillance: a practical guide informed by the early experience
J Am Med Inform Assoc
(2004) - et al.
Disease outbreak detection system using syndromic data in the greater Washington DC area
Am J Prev Med
(2002) - et al.
Technical description of RODS: a real-time public health surveillance system
J Am Med Inform Assoc
(2003) - et al.
Framework for evaluating public health surveillance systems for early detection of outbreaks: recommendations from the CDC Working Group
MMWR Recomm Rep
(2004) Updated guidelines for evaluating public health surveillance systems: recommendations from the guidelines working group
MMWR
(2001)ICD-9-CM 2002: International Classification of Diseases, 9th Revision
(2002)- et al.
Use of emergency department chief complaint and diagnostic codes for identifying respiratory illness in a pediatric population
Pediatr Emerg Care
(2004) - et al.
Accuracy of ICD-9–coded chief complaints and diagnoses for the detection of acute respiratory illness
Proc AMIA Symp
(2001) - et al.
Accuracy of three classifiers of acute gastrointestinal syndrome for syndromic surveillance
Proc AMIA Symp
(2002)
Cited by (36)
Syndromic surveillance using laboratory test requests: A practical guide informed by experience with two systems
2014, Preventive Veterinary MedicineCitation Excerpt :The experts generally suggested very detailed classification of records, which would have resulted in a large list of syndromes. However, the final definition of syndromic groups to be monitored should be based not only on expert opinions regarding the biological characteristics of the cases, but also on statistical considerations (Reis and Mandl, 2004; Burkom et al., 2004). It is important in the initial phases of system development to perform such classification with a reasonable degree of detail.
Using chief complaints for syndromic surveillance: A review of chief complaint based classifiers in North America
2013, Journal of Biomedical InformaticsA test of syndromic surveillance using a severe acute respiratory syndrome model
2009, American Journal of Emergency MedicineCitation Excerpt :Further details on these 6 patients are shown in Table 1. Other authors have described the impact of different methods used to group data into syndromic categories [14]. Another reason why our exercise did not trigger an alarm could have been how age is stratified in the detection algorithm.
Unsupervised clustering of over-the-counter healthcare products into product categories
2007, Journal of Biomedical InformaticsThe Value of Patient Self-report for Disease Surveillance
2007, Journal of the American Medical Informatics AssociationCitation Excerpt :They are not collected for the purpose of surveillance and hence used by disease surveillance systems as a secondary data source. While there has been considerable success “reading” these short text strings with natural language text classifiers, and using them to assign people to illnesses of interest (for example, influenza-like illness), results have been variable.8,10,12,13 An alternative to using routinely collected data is to have hospital staff manually enter information on every patient at the time of the visit.
- ☆
Author contributions: BYR and KDM conceived the study and designed the experiments. KDM undertook recruitment of participating hospitals, developed the data sets, and obtained institutional review board approval. BYR developed the simulations and performed the analyses and integration of the results. BYR and KDM formulated the findings, and both had equal roles in drafting and revising the manuscript. BYR takes responsibility for the paper as a whole.
- ☆☆
Supported by the National Institutes of Health through a grant from the National Library of Medicine (R01LM07677-01), by contract 290-00-0020 from the Agency for Health Care Quality and Research, and by the Alfred P. Sloan Foundation (grant 2002-12-1).
Reprints not available from the authors.