Article Text


Validation of healthcare administrative data for the diagnosis of epilepsy
  1. C Franchi1,
  2. G Giussani2,
  3. P Messina2,
  4. M Montesano2,
  5. S Romi2,
  6. A Nobili1,
  7. I Fortino3,
  8. A Bortolotti3,
  9. L Merlino3,
  10. E Beghi2,
  11. and the EPIRES Group
  1. 1Laboratory for Quality Assessment of Geriatric Therapies and Services, IRCCS—Istituto di Ricerche Farmacologiche ‘Mario Negri’, Milano, Italy
  2. 2Laboratory of Neurological Disorders, Department of Neuroscience, IRCSS-Istituto di Ricerche ‘Farmacologiche Mario Negri’, Milan, Italy
  3. 3Regional Health Ministry, Lombardy Region, Milan, Italy
  1. Correspondence to Dr Carlotta Franchi, Laboratory for Quality Assessment of Geriatric Therapies and Services, IRCCS—Istituto di Ricerche Farmacologiche ‘Mario Negri’, Via Giuseppe La Masa, 19, Milan 20156, Italy; carlotta.franchi{at}


Background Administrative databases have become an important tool to monitor diseases. Patients with epilepsy could be traced using disease-specific codes and prescriptions, but formal validation is required to obtain an accurate case definition. The aim of the study was to correlate administrative data on epilepsy with an independent source of patients with epilepsy in a district of Lombardy, Northern Italy, from 2000 to 2008.

Methods Data of nearly 320 600 inhabitants in the district of Lecco collected from the Drug Administrative Database of the Lombardy Region were analysed. Among them were included patients who fulfilled the International Classification of Diseases 9 (ICD-9) codes and/or the disease-specific exemption code for epilepsy and those who had at least one EEG record and took antiepileptic drugs (AEDs) as monotherapy or in variable combinations. To ascertain epilepsy cases, 11 general practitioners (GPs) with 15 728 affiliates were contacted. Multiple versions of the diagnostic algorithm were developed using different logistic regression models and all combinations of the four independent variables.

Results Among the GP affiliates, 71 (4.5/1000) had a gold standard diagnosis of epilepsy. The best and most conservative algorithm included EEG and selected treatment schedules and identified 61/71 patients with epilepsy (sensitivity 85.9%, CI 76.0% to 92.2%) and 15 623/15 657 patients without epilepsy (specificity 99.8%,CI 99.7% to 99.8%). The positive and negative predictive values were 64.2% and 99.9%. Sensitivity (86.7%) and the positive predictive value (68.4%) increased only slightly when patients with single seizures were included.

Conclusions A diagnostic algorithm including EEG and selected treatment schedules is only moderately sensitive for the detection of epilepsy and seizures. These findings apply only to the Northern Italian scenario.


Statistics from


Administrative databases are a possible tool to monitor the frequency and trends of diseases traced by the healthcare systems. In recent years, these databases have become a valuable source of data for disease surveillance, assessment of health resource utilisation and evaluation of healthcare outcome.1 Epilepsy is one such disease because it may require hospital admission, selected diagnostic tests (eg, EEG) and, when the diagnosis is confirmed, chronic treatment with specific drugs. In addition, in several countries, including Italy, people with epilepsy can benefit from the use of ad hoc disease-specific exemption codes to have free-of-charge diagnostic and therapeutic aids. All these variables, if properly combined, can be used as tracers of epilepsy in administrative databases. However, for an accurate case definition, these combinations must undergo formal validation. Validation of algorithms used to identify patients with epilepsy (as with any other clinical condition) is essential to avoid misclassification bias, which may threaten the internal validity of the system and lead to incorrect estimation of the frequency of the disease. A systematic review of studies validating identification algorithms using administrative data has been recently published.2 Based on the results of this study, a checklist was devised for the development and use of reporting guidelines.

Hospital discharge diagnoses (HDDs) as part of the public health surveillance system were introduced in Italy in 1992 to calculate the cost of hospital stay for each given disease and for reimbursement of hospitals by governmental healthcare agencies.3 ,4 In addition to HDDs, data were later collected on specialist consultations, procedures and treatments. All these data have been made available since 2000 in several administrative districts, including Lombardy, a 9 million population area of Northern Italy (regional records overlapped ‘real data’ in >90% of cases, as from communications from Lombardy Region). The large size of the local population and the prolonged period of data collection represent valuable sources for the calculation of the frequency and trends of epilepsy. The aim of this study was thus to correlate clinical data and administrative data on epilepsy.


Data sources and variables selected

This study used data collected from the Drug Administrative Database of the Lombardy Region, Northern Italy, from 2000 to 2008. The structure of this database, routinely updated for administrative and reimbursement purposes, has been described in detail elsewhere.5–7 Briefly, it includes different sections containing (1) the patient's data, including sex and date of birth, (2) prescription records with information on the drugs dispensed by the retail pharmacies in the territory, (3) hospital data, including the International Classification of Diseases-9 (ICD-9) codes for the discharge diagnosis (whether primary or secondary), (4) prescription records for diagnostic tests (including EEG) and (5) records for the disease-specific exemption codes. All these sections are linked by a unique identification code. Only drugs publicly subsidised by the Italian National Health System (NHS) are collected. All drugs prescribed were classified according to the international Anatomic Therapeutic Classification system (ATC).8

Epilepsy diagnosis in the administrative records

The records in the regional database from the administrative district of Lecco, Lombardy Region (population 320 609), collected during the years 2000–2008, were the data source to be validated. The best algorithm (see table 1) to detect epilepsy cases was developed from sample frames including different combinations of diagnostic codes, diagnostic procedures and drug treatments.

Table 1

Diagnostic models

The following codes were examined singly and then in differing combinations: (1) ICD-9-CM code for epilepsy (345.x) OR 333.2 (myoclonus) OR 780.3 (convulsions, febrile or afebrile) OR 779.0 (neonatal seizures) OR 781.0 (spasms, other abnormal involuntary movements); (2) Exemption code (EXE) for epilepsy 017 (based on the ICD-9-CM code for epilepsy); (3) Having at least one EEG; (4) Taking one or more antiepileptic drugs (AEDs) (ATC codes) as monotherapy or in variable combinations. As several AEDs are commonly taken for clinical conditions other than epilepsy, drug treatment was also tested as follows: (1) At least two of carbamazepine (CBZ), phenytoin (PHT), phenobarbital (PB), primidone (PRM), barbexaclone (BSC), clonazepam (CLZ), ethosuximide (ESM), valproate (VPA), valpromide (VPM), clobazam (CLB), vigabatrin (VGB), felbamate (FLB), tiagabine (TGB), pregabalin (PRG), oxcarbazepine (OXC), gabapentin (GBP), topiramate (TPM), levetiracetam (LEV), zonisamide (ZNS); (2) One between CBZ, VPA, PHT, PB, PRM, BSC, CLZ, ESM, CLB, VGB, FLB, TGB, OXC, LEV, ZNS.

Details of age, sex and the patient's general practitioner (GP) were also recorded as being present in the database.

Case ascertainment (gold standard)

A total of 11 GPs (including three paediatricians), working in the study area and having 15 728 affiliates, were involved and represented the gold standard. Each GP kept electronic records of his/her patients. These records were examined in search of patients with epilepsy. The correctness of the diagnosis available at the GP desk is assured by the fact that, in Italy, epilepsy is confirmed only after a neurology (or child neurology) consultation. All the patients identified through their GPs had one or more neurological consultations.

A junior investigator contacted each GP by telephone to collect information about all patients with epilepsy in his practice (see below) during the study period (2000–2008). For each patient, the data collected included age, sex, residency, seizure type(s), disease duration, epilepsy syndrome, number and type of AEDs (and drug schedules) in current use and therapy duration.

All the data were managed according to the current Italian law on privacy and we obtained the Local Health Unit (LHU) authorisation to collect anonymous data from GPs. With regard to the use of administrative data, the study was also approved by the Ethical Committee of the Lombardy region.

In keeping with the requirements for epidemiological studies, epilepsy was defined as a condition characterised by two or more unprovoked epileptic seizures 24+ hours apart with neurological confirmation.9 ,10 In keeping with this definition, isolated seizures were classified separately.11

Statistical analysis

Information acquired through case ascertainment was linked to the administrative records by date of birth, sex and general practitioner. Possible duplicates were deleted through a manual evaluation of all duplicate inputs generated after the linking procedure. Five different dummy variables were assigned to each record according to: (1) The gold standard diagnosis of epilepsy (0=No; 1=Yes) (Source GPs) [Y]; (2) Having an ICD code for epilepsy (0,1) (Source administrative records) [ICD]; (3) Having a disease-specific EXE (0,1) (source administrative records) [EXE]; (4) Having at least one EEG (0,1) (Source administrative records) [EEG]; (5) Taking at least one of the listed AEDs singly or in combination, as above indicated (0,1) (Source administrative records) [AED]. Multivariable logistic regression models with [Y] as the outcome variable and [ICD, EXE, EEG AED] as independent variables were used in order to carry out the algorithm better fitting our data. All combinations of the 4 independent variables led to 15 different models (see table 1).

The results of the logistic models are displayed as sensitivity (SE), specificity (SP), positive predictive value (PPV), negative predictive value (NPV) and area under the curve (AUC) with 95% CIs, which refer to the discriminant power of a classifier according to the joint values of SE and SP and range from 0.5 (random classifier) to 1 (perfect classifier).12 SE refers to the proportion of epilepsy cases in the administrative records that were recorded as epilepsy cases by the gold standard. SP refers to the proportion of non-epilepsy cases identified by the administrative records that were recorded by the gold standard as non-epilepsy cases. PPV refers to the proportion of cases with epilepsy identified by the gold standard among the epilepsy cases in the administrative records. NPV refers to the proportion of non-epilepsy cases identified by the gold standard among the non-epilepsy cases in the administrative records. Validation analyses were performed on the entire sample and separately for children and adults. Patients with epilepsy were also assessed separately from patients with single seizures. All models assigned to each record a probability to be a patient with epilepsy (score from 0 to 1) according to its independent variables. A cut-off of the predicted values was automatically chosen in order to maximise the AUC value and the overlap between the predicted and observed cases and non-cases. Statistical analyses were carried out using the Statistical Analysis System (SAS) (V.9.2, Cary, North Carolina, USA).


Among 15 728 affiliates of the 11 GPs, 71 (4.5/1000) had a gold standard diagnosis of epilepsy (confirmed by GPs). The different algorithms used to assess the validity of administrative records are reported in table 1.

The number of true-positive and false-positive (FP) and true-negative and false-negative (FN) items of each model were chosen according to the cut-off which maximises the AUC value. The model with the highest AUC value was #11 (AUC=0.9498, FP=54, FN=8), which however led to a very high number of FP records (ie, low PPV). The full model (#15) (FP=40, FN=8) reduces the number of FP records. However, we identified three different models (101 314), which led to the same results (FP=34, FN=10) and had a better performance (in terms of the total number of diagnostic errors, 44 vs 48). The most conservative of the three models was #10 (Y=EEG+AED), which included the minimum number of parameters. Using this model, the administrative records traced 95 positives and 15 633 negatives. The characteristics of the epileptic population from the administrative database and the GPs are shown in table 2.

Table 2

Comparison between administrative database and GPs’ epileptic population characteristics

The best algorithm traced appropriately 61/71 patients with epilepsy (SE, 85.9%, 95% CI 76.0% to 92.2%) and 15 623/15 657 non-epileptic participants (SP 99.8%, 95% CI 99.7% to 99.8%). The number of false-negative cases and false-positive cases was 10 and 34, respectively (figure 1). The details on these cases are reported in tables 3 and 4. The observed PPV and NPV rates were 64.2% and 99.9%. These results were virtually unchanged when adults (13 522) and children (2206) were assessed separately. Administrative records traced appropriately 54/63 adults and 7/8 children with epilepsy and 13 428/13 459 and 2195/2198 non-epileptic adults and children. SE (65/75=86.7%) and PPV (65/95=68.4%) increased slightly when patients with single seizures (four) were included in the true-positive category (AUC 95% CIs 0.9507 (0.9168 to 0.9846) vs 0.9478 (0.9121 to 0.9836).

Table 3

List of diagnoses/indications in patients with clinical conditions other than epilepsy (false positives)

Table 4

Drug therapy in patients with epilepsy not present in the administrative records (false negatives)

Figure 1

Positive and negative records in the administrative database (Best Model). Details on true and false positive/negative rates for each separate code. Positive and negative records are displayed in the upper and lower parts of the figure, respectively. AED, antiepileptic drugs; EXE, exemption code; FN, false negatives; FP, false positives; ICD, International classification of diseases; TN, true negatives; TP, true positives.

Thirty-four patients were found not to have epilepsy. Fifteen of them took AEDs for seizure prophylaxis, 11 had another clinical condition requiring treatment with AEDs, and four had single unprovoked seizures (table 3).

Ten patients with epilepsy were not found in the administrative database (table 4).

Seven of them were not traced by any of the four diagnostic criteria (ICD, EXE, EEG and AED). One was traced through ICD and AEDs, one through EXE and AED and one through ICD, EXE and AEDs. Six of them received monotherapy with CBZ, OXC, LEV and VPA. One was treated with PB until 2002 and two, despite fulfilment of the criteria included in the diagnostic algorithm, were not traced at all by the system.


This study shows that, when testing administrative records from a well-defined area as tracers of epilepsy, a diagnostic algorithm including the EEG and selected treatment schedules is only moderately sensitive in tracing prevalent epilepsy cases. The validity of the algorithm does not change significantly when patients with single unprovoked seizures are included.

In our sample, the ICD coding alone was a highly specific but fairly insensitive diagnostic measure. Only 36.6% of cases (26/71) were detected through the ICD codes because hospitalisations in patients with epilepsy are uncommon. The only ICD-9 codes identified in this sample were 345.x and 780.3. When used, these codes were correct in all but one case: a patient receiving AED prophylaxis for subarachnoid haemorrhage. In a recent review of studies using all the ICD-9-CM codes,13 PPV ranged from 21% to 98%. However, no algorithms included only ICD-9-CM codes, and the PPV was generally highest in studies using the ICD codes. One such study was that of Jettè et al14 who validated administrative databases in the Calgary Health Region in Alberta, Canada, to prove ICD coding for epilepsy, with the result that administrative emergency and hospital discharge data have high epilepsy coding validity overall. For these reasons, Kee et al13 concluded that EEG or AED monitoring codes should not be used to identify seizures, convulsions or epilepsy without also including an appropriate diagnostic code. These conclusions are at variance with our results, which showed that algorithms using only the ICD or the disease-specific EXEs for epilepsy had the lowest SE (see table 1). The use, in our system, of the ICD codes only for hospital admissions and the use of disease-specific EXEs only by patients who are not affluent or who do not wish to conceal the disease may explain the difference.

The inclusion of EEG in our diagnostic algorithm improved the PPV significantly. Our data are in keeping with that of Holden et al15 who also found EEG to be significant in excluding non-epilepsy cases.

Four patients having a single unprovoked seizure were also traced by our administrative records. All of them received chronic treatment and, as confirmed in a post hoc analysis, they were traced several times by the system. As the validity of our best algorithm was unchanged when including these patients as true positives and patients with single seizures had repeated contacts with the system, we can only speculate that our administrative system is unable to identify these cases as a separate diagnostic category.

The drugs and drug combinations selected as treatments for epilepsy in our study are in line with the prescribing patterns for drug-resistant epilepsy from tertiary referral centres in Italy.16 To that extent, we tried to exclude patients receiving monotherapy with drugs mostly licensed for clinical conditions other than epilepsy. Nevertheless, patients receiving only CBZ or VPA were retained to prevent the exclusion of children treated with these two drugs in monotherapy, even if this led to an increased number of false-positive cases (see table 4).

The study has limitations. First, our best diagnostic algorithm allowed for a 35.8% false-positive rate (PPV 64.2%) because 34 patients received AEDs for seizure prophylaxis or for clinical conditions other than epilepsy. We could not exclude those patients through the ICD codes, as others did17 because our administrative records do not include treatment indications. For this reason, even the best algorithm tends to overestimate the frequency of epilepsy. However, an attempt to exclude those cases with more stringent therapeutic codes significantly reduced the SE of our diagnostic approach (see table 1). Second, the validation of the diagnostic algorithm has been performed confronting the administrative data with the records of GPs from a single administrative district. Thus, we do not know if our results apply to other districts of the Lombardy region. However, an audit of the management of epilepsy in Lombardy, where an integrated diagnostic and therapeutic assessment of the disease had been active for several years, showed that the diagnosis made in tertiary referral centres is fairly precise and leads to a correct classification of the epilepsies in up to 85% of the cases.18 Third, although the administrative records include details on the outpatient neurological consultations, the specialists are not required to indicate the diagnostic code in the reimbursement claim. For this reason, our algorithm differs from several others.19–21 However, we do not know from these studies which is the real contribution of the physicians’ claims to the overall validity of the diagnostic algorithms. Fourth, the algorithm allows for inclusion of only patients receiving treatment. Some untreated individuals may have thus escaped notice. However, the number of losses should be negligible because in Italy virtually all patients with a diagnosis of epilepsy receive treatment and there were no untreated patients in our GPs’ lists. Fifth, we decided not to include repeated contacts with the healthcare system because the use of AEDs necessarily implies repeated drug claims. In addition, as GPs and specialists are not required to code the diagnosis during the outpatient visits, the inclusion of this variable was considered redundant. Sixth, we did not find patients with ICD codes other than 345.x (25 patients) and 780.3 (1 patient). This may be explained by the small sample size. For this reason, we do not know to what extent the frequency of epilepsy is overestimated in our population. Seventh, four patients were not present in the GPs’ files. One possible explanation is that these patients decided to switch for assistance to other GPs and the referent GPs’ codes were not updated. Coding errors cannot therefore be excluded in these cases. Finally, these findings have only been applied to the Northern Italian scenario. Accordingly, similar validation exercises need to be carried out in other scenarios and when different administrative databases are used.

This study shows that administrative records from a well-defined geographical region of Europe and containing data on current drug therapy and EEG records provide only moderate SE in identifying prevalent cases of epilepsy. The list produced from the records contained FPs that were due predominantly to patients taking anticonvulsants for prophylaxis after subarachnoid haemorrhage, head trauma or meningioma and FNs mostly due to omission of patients with diagnosed epilepsy not having EEG records. These results confirm that examination of administrative databases tends to provide fairly inaccurate lists of patients with epilepsy, even in jurisdictions where the records contain detailed information collected from a well-defined geographical area.

What is already known on this subject

  • Administrative databases are an important tool to monitor diseases. Patients with epilepsy could be traced using disease-specific codes and prescriptions, but formal validation is still missing.

What this study adds

  • Development of multiple versions of a diagnostic algorithm using International Classification of Diseases-9 (ICD-9)-CM codes for epilepsy, disease specific exemption code, EEG and selected antiepileptic drug (AED) prescriptions.

  • Development of an algorithm to identify patients with epilepsy and/or with a single seizure.

  • An algorithm with EEG and selected AED prescriptions has a moderate sensitivity for the detection of epilepsy and seizures.


View Abstract


  • Collaborators EPIRES Group: Dr Elio Agostoni, Dr Francesco Basso, Dr Andrea Rigamonti, Dr Lorenzo Stanzani, Dr Ottaviano Martinelli, Dr Cristina Volpe, Dr Marialuisa Carpanelli, Dr Andrea Magnoni, Dr Larissa Airoldi, Dr Mariolina Di Stefano, Dr Claudio Zucca, Dr Nicoletta Zanotta, Dr Pietro Baccomo, Dr Giancarlo Balestra, Dr Bergamini Massimo, Dr Marcellino Bianchi, Dr Edoardo Giovanni Bolis, Dr Rosalia Cavenago, Dr Mario Crotta, Dr Marco Coduri, Dr Katerina Tinterova, Dr Alberto Palazzuolo, Dr Patrizia Daielli, Dr Valeria Mazzoleni, Dr Anna Villella.

  • Contributors CF, GG and EB contributed substantially to the study concept and design; they collected and analysed the data, participated in the interpretation of the data, wrote the paper, reviewed it critically for important intellectual content and gave final approval to the version to be published. PM performed statistical analyses, contributed to the study concept and design and gave final approval of the version to be published. MM and SR contributed to the collection of the data. AN, IF, AB and LM did a critical revision of the manuscript and gave final approval to the version to be published. All the members of the EPIRES Group contributed to the acquisition of the data and discussed the case histories of all their patients with the other investigators to provide valuable information for the verification of the quality of the diagnosis. They also helped with the preparation of the manuscript with useful comments on the first draft, and gave final approval to the version to be published.

  • Funding This study was supported by grants (n. 14979/RL) from the Region Health Ministry of the Lombardy Region (Progetto ‘Epidemiologia dei Farmaci’—EPIFARM).

  • Competing interests EB serves on the editorial advisory boards of Epilepsia, Amyotrophic Lateral Sclerosis, Clinical Neurology & Neurosurgery, and Neuroepidemiology; he has received fees for board membership from VIROPHARMA and EISAI, funding for travel and speaker honoraria from UCB-Pharma, Sanofi-Aventis, GSK and for educational presentations from GSK.

  • Ethics approval All the data were managed according to the current Italian law on privacy and we obtained LHU authorisation to collect anonymous data from GPs. With regard to the use of administrative data, the study was also approved by the Ethical Committee of the Lombardy region.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.