Estimating the incidence of coeliac disease with capture-recapture methods within four geographic areas in Italy

Study objective - To estimate the incidence rate of newly diagnosed cases of coeliac disease in Italy. Design - This was a descriptive study of coeliac disease incidence in the period 1990-91. Setting - During 1990-91 newly diagnosed cases of coeliac disease were signalled by several sources including diagnostic records of departments of paediatrics, general medicine and gastroenterology, national health service records for the sup-ply of gluten free diets and the archives of the Italian Coeliac Society.

cases of coeliac disease were signalled by several sources including diagnostic records of departments of paediatrics, general medicine and gastroenterology, national health service records for the supply of gluten free diets and the archives of the Italian Coeliac Society. Patients -Altogether 1475 cases were flagged throughout Italy, 478 ofwhom were selected, corresponding to 270 individual patients from a target population resident in four areas: Provices ofTurin and Cuneo (Piedmont Region, northern Italy); Province ofBrescia (Lombardia Region, northern Italy); Umbria Region (central Italy) and Sardinia Region (insular Italy). Only for these areas were patients flagged from several sources and the reference population was identifiable. Main results -The overall crude incidence rates for all ages per 100 000 residents per year were 2-4, 2-7, 1-5, and 1*7 in the four areas, respectively. The childhood cumulative incidence rates (aged < 15 years) per 100000 live births were 143, 141, 72, and 80 respectively. The mean ages at diagnosis were similar for both childhood and adult cases throughout the areasthese were around 4 and 34 years respectively. For each area, the incidence rate was constantly higher in the main city than elsewhere. Using the capture-recapture method, an estimated completeness of case archives of0-84 was obtained, whereas this figure was only 0*47 for hospital sources.
Conclusions -This population based study on the incidence of coeliac disease shows that several information sources should be used to avoid underestimation. The incidence rate of coeliac disease in Italy was among the highest in Europe, and was widely variable showing highest figures in Piedmont and Lombardia and the lowest in Umbria and Sardinia. This trend was not due to different age at diagnosis, which suggests variable diagnostic awareness of the disease rather than different environmental patterns affecting the clinical presentation.
(J7 Epidemiol Community Health 1996;50:299-305) Although coeliac disease (CD) is a health problem that carries an increased risk ofmalignancy, epidemiological data concerning its incidence are still incomplete. High frequencies of clinically manifest disease in children have been reported from western Ireland' and Sweden,2 whereas the lowest reported childhood incidence rates come from Finland3 and Denmark. 4 No information is available from the United States of America5 and few reports are available for Mediterranean countries." This epidemiological information is valuable as it helps to improve understanding of aetiological factors and the implementation of health programmes aiming towards better recognition and treatment of the disease. 9 In the past two decades the timing of diagnosis has gradually moved towards adulthood'0 because of an increasing recognition of atypical and subclinical cases." Nonetheless, the incidence of CD has been assessed, with a few exceptions, '2-4 only in children.
Epidemiological evidence of different geographical and/or temporal variability in the incidence of CD is difficult to interpret. This is because of wide variability in clinical awareness,'5 endoscopic duodenal biopsy,'6 and implementation of serological screening.'7 Moreover, in the absence of a national register of the disease, several information sources should be used to measure incidence, since the use of single sources such as death certificates, CD societies, or hospital records have been shown to yield a quarter of the actual cases. '8 These considerations prompted us to constitute a working group within the Italian Association for the Study of Small Bowel Diseases ("Club del Tenue"), with the aim of assessing the incidence of newly diagnosed cases of coeliac disease in Italy. This paper reports the results of the study for four well defined geographic areas.

CATCHMENT AREAS
The overall number of CD cases flagged by all participating centres was 1475. We selected only those areas where flagging satisfied the criteria indicated below in "data collection".
Thus, the study includes cases of CD newly diagnosed between 1 January 1990 and 31 December 1991 in the resident population of four Italian areas: the provinces of Turin and Cuneo (Piedmont Region), the Province of Brescia (Lombardia Region), the Umbria Region, and the Sardinia Region. In these areas, according to the 1991 Italian population census the resident population was 6 339 194 (11 0% of the whole Italian population) and 1 07 048 live births were recorded during the study period (9 4% of all live births in Italy).
DATA COLLECTION CD patients were identified from four different information sources: * Diagnostic lists of paediatric, general medicine and gastroenterology departments of hospitals in the study areas; * Diagnostic lists of leading Italian hospitals likely to attract CD patients throughout the country; * National health service records of patients for whom gluten free food was provided on the basis of histological evidence of CD; * Archives of the local branches of the Italian Coeliac Society.
Data of patients entered for the first time during the study period were independently collected from each of the four sources considered. Details included surname, name, date of birth, date of first diagnosis, birthplace, and residence area. Flagging archives were thus obtained where individual patients were expected to be reported more than once, indicating flagging from multiple sources. Record linkage by surname, name, gender, and date of birth allowed us to build archives of newly diagnosed cases of CD during the study. We excluded from the study patients whose name reappeared in the records after interruption of their gluten free diet and patients who were already present in the national health service or Italian Coeliac Society records in the years preceding the study period (prevalent cases), in whom the diagnosis had therefore already been made. In order to calculate the incidence of new cases within the selected geographical areas, patients diagnosed within these areas but resident elsewhere were also excluded.

EVALUATING DIAGNOSTIC APPROACH
The diagnosis of CD for all patients included in the study was performed by retrospective examination of data collected routinely in the hospitals where the diagnoses were made. Patients were classified into three categories according to the diagnostic approach as follows: diagnosis unsupported by duodenal or jejunal biopsy; based on a single abnormal duodenojejunal histology finding; on diagnosis formulated after the European Society of Paediatric Gastroenterology and Nutrition (ESPGAN) criteria. The latter entails abnormal duodenal or jejunal histology followed by clinical or histological return to normal on gluten free diet.'9 ESTIMATING COMPLETENESS OF CASE ARCHIVES AND ITS SOURCES OF VARIABILITY To estimate completeness of case ascertainment for the whole archives and each information source we used the Lincoln-Peterson capture-recapture method,202" which compares results for multiple independent sources ascertaining the same event. Completeness of the case archives was expressed as the proportion between the observed and expected numbers of new diagnosis in the target population (N). To estimate N, we considered hospital flagging (sources 1 and 2 of data collection section) as primary sources, while national health service and Italian Coeliac Society were considered as secondary sources. N was calculated according to the Chapman estimator"2: where M is the number of new diagnoses identified by the primary sources; n is the number of new diagnoses identified by at least one of the secondary sources; and m is the number of new diagnoses identified by both the primary and at least one of the secondary sources. An approximate unbiased estimate of the variance of N was derived by Seber23 and given as: The 95% confidence interval (CI) of N was calculated using the formula: Equation (1) produces a valid estimate of the expected number of cases under the assumption of independence of data sources,that is, each subject must have an equal chance of being reported as case in the secondary sources, regardless of whether or not he/she was identified as case by the primary sources.
We then constructed a matrix that considers the number of new diagnoses reported in the archives and the estimated number of unreported new diagnoses for each possible combination of gender, age (< 15 v >15 years), area of residence, and residence within or outside the area's main city. The estimated number of unreported diagnoses was calculated by the difference between expected cases (equation 1) and those reported in the archives. This structure corresponds to a case-control study, where cases and controls are the patients unreported and reported in the archives respectively. The risk of unreporting associated with the considered variables was estimated by a logistic regression model and expressed as the maximum likelihood estimate of relative risks and their 95% confidence intervals.24 These calculations were performed using the logistic procedure of the SAS package.25 The relationship between the observed overall incidence rate and gender, age, and area of residence was assessed by a Poisson's multiple linear regression model. 26 This considers the incidence rate as the dependent variable, and gender, age, area of residence, and residence within or outside the area's main city as the independent variables. We then calculated the maximum likelihood estimate of the relative risk and the corresponding 95% confidence interval for each level of independent variables. These calculations were performed using the EGRET package.27

ESTIMATING CHILDHOOD CUMULATIVE INCIDENCE
The childhood cumulative incidence was not directly computable since our observation period covered only two years. We therefore used the density method28 based on the functional relationship existing between incidence rate and childhood cumulative incidence: CCI, = 1 -exp( -EIRj j) (4) where CCI,j is the cumulative incidence occurring between birth and age j, and EjIR&j sums the age specific annual incidence rates occurring between the first and the jth year of age. Incidence rates used in equation (4) were obtained from the observed number of cases. The childhood cumulative incidence was computed for each of the four areas, and individually for each main city, between the 1st and the 15th year of age and expressed as the number of cases occurring from birth to the jth year of age in a hypothetical cohort of 100000 live births. Equation (4) produces a valid estimate of the childhood cumulative incidence under the assumption that the incidence rate remains constant during the calendar period from the age of birth of the hypothetical cohort and the year of observation of cases.

EVALUATING SOURCES OF VARIABILITY OF AGE AT DIAGNOSIS
A hierarchical model of analysis of the variance (ANOVA) for unbalanced data29 was fitted to evaluate the sources of variability of age at diagnosis. In the model, the main effects of gender and residence areas and the nested effect of city of residence (main city or other cities) within the residence area, were considered. The model was applied in the two strata corresponding to age < 15 years and > 15 years, arbitrarily considered as paediatric and adult ages respectively. These calculations were performed using the GLM procedure of the SAS package.30

FROM FLAGGING ARCHIVES TO CASE ARCHIVES
We received 478 flagged records of newly diagnosed CD patients during 1990-91 from the four areas included in the study. The hospital lists provided 181 records (151 from the study areas and 30 from other hospitals outside the areas); the national health service and the Italian Coeliac Society lists provided 220 and 77 records respectively. From the 478 flagged records we constructed a case archives of 270 patients, with an average per patient flagging frequency of 1-77.
Among the 270 patients considered in the study, 89 were males (male:female ratio 1:2-0). Altogether 139 patients were diagnosed at < 15 years and 131 at > 15 years. The overall mean age at diagnosis was 21 years; it was 3-7 for paediatric cases (< 15 years) and 34 years for adult cases (>15 years).

DIAGNOSTIC APPROACH
Of the 270 patients, 231 (85-6%) had medical records available. Their diagnosis was always supported by duodenal or jejunal biopsy: in 109 cases (47 2%) diagnosis was made according to a single abnormal duodeno-jejunal histology finding and in the remaining 122 cases according to ESPGAN criteria.    0-41 (Sardinia and Umbria) to 0 50 (Lombardia). Table 2 shows that the estimated risk of unreported new diagnosis was not associated with gender, age, or area and city of residence. Table 3 shows that the crude incidence rate was highest in the northern areas (Piedmont and Lombardia) and lowest in the other areas, irrespective of whether it was calculated from hospital sources alone or from observed or expected cases. Table 4 shows the estimated relative risks associated with gender, age, and area of residence. Females had a significantly higher relative risk. Despite a progressive reduction in the relative risk with increasing age, an appreciable number of cases were diagnosed above age 60 (n = 15). By taking cases resident in Sardinia as the reference category, relative risks were significantly higher in northern Italy. Cases resident in the main cities also showed a significantly higher relative risk.

CHILDHOOD CUMULATIVE INCIDENCE
The figure shows the observed childhood cumulative incidence values according to age in the four areas. Two clusters were present.
For northern areas, the childhood cumulative incidence was constantly the highest, reaching twice the value of the other two areas at the age of 15 years. Table 5 shows the estimated childhood cumulative incidence at the age of 2, 5, 10,  Table 6 shows the results of the ANOVA performed separately for the two age groups. In both paediatric (< 15 years) and adult (>15   years) patients, the age at diagnosis was significantly homogeneous according to the area of residence and to the city of residence within each area. A significantly higher age at diagnosis was observed in women.

Discussion
Epidemiological studies of CD pose several problems, particularly when they are trying to estimate its incidence rate.3' Firstly, all diagnoses originating from well defined populations should be included to estimate the disease incidence correctly. The total flagged records in this study were 1475, but only 478 originated from well defined geographical areas. For most participating centres, flagging was obtained only from some of the hospital departments in the areas; several health districts were not able to provide the requested information and most Italian Coeliac Society branches had no authorisation to provide confidential patient data. A well defined originating population was found only for four areas, covering 10% of both the Italian population and the number of live births in the study period. We may have introduced bias by selecting areas in which the greatest diagnostic awareness existed. However, this should not have affected the validity of comparing results between these areas. Incomplete inclusion of all diagnoses might occur, as shown by the finding that 52% of our patients were diagnosed outside their residence area. To avoid this, we used multiple information sources which also included referral hospitals outside the study areas likely to attract patients throughout the country. These provided only 6-3% of all flagged records, however, whereas 62% of them were derived from specific national health service lists and branches of the Italian Coeliac Society and would have been missed using only hospital sources. In areas where hospital flagging was scarce, we observed a balance due to high flagging rates from the other sources. The use of the capture-recapture technique2023 estimated that our approach recruited 84% ofCD patients newly diagnosed in defined reference populations. This 16% underestimate of the incidence rate was uniformly distributed among the areas and was independent of gender, age and rural or urban residence. Our comparisons are therefore correct and unbiased by underreporting according to the type of populations. Our calculated childhood cumulative incidence in Piedmont was twice the figure reported for that area by the paediatric multicentre ESPGAN study (1-4/10-3 v 0-7/10-3 respectively),8 which used only hospital sources.
If we consider data from our hospital source, we actually obtain the same figure as that reported by the ESPGAN study. Thus, we believe that our data are more likely to reflect the actual coeliac disease incidence.
The second problem is that standard diagnostic criteria should be used. We verified that for all our patients whose medical records could be retrieved, diagnoses were formulated according to histological criteria, although only in half the cases were the strictest ESPGAN criteria used.'9 This should not have biased our results, since diagnosis of CD formulated according to a single histological finding, is confirmed according to the ESPGAN criteria in 95% of the cases.32 Thirdly, diagnostic criteria should be homogeneous throughout the country and the period considered. We observed no difference in diagnostic criteria between the four areas and according to urban or rural residence within each area. Moreover, our study covered an observation period of just two years, to avoid bias due to time-dependent changes in the diagnostic approach. Thus, we were not able to measure directly the childhood cumulative incidence, and based our estimates on the density method that exponentially transforms age specific rates of a dynamic population observed transversally. The validity of our estimates depends on assuming a constant age specific incidence rate throughout the years. In the last 20 years, the frequency of diagnosing subclinical CD in Italy has increased,'5 suggesting an underestimate of our figures. However, this increased diagnostic frequency in a long lasting observational study may be affected by changes in diagnostic modalities and/or awareness rather than reflecting an actual increased disease incidence.
Fourthly, in an incidence study cases are included when the diagnosis is formulated, and the onset of disease cannot be dated. The calculation of the incidence rate is therefore of dubious validity,3' particularly for CD whose onset is often gradual and escapes dating. Methods of measuring cumulative incidence are implemented to estimate the actual frequency of disease occurrence. A declining childhood cumulative incidence trend during the 1980s has been reported from Ireland and United Kingdom3334 and other European areas. 35 Increased mean age at diagnosis may explain this finding,'0 pointing to a need to include diagnoses formulated at all ages and not only during childhood. Possible temporal trends and/or between-population variability of diagnostic frequency will therefore be appreciated. Cautious interpretation of incidence rates referring to adulthood is needed, as these reflect the diagnostic frequency of the disease rather than its actual incidence, particularly for our study where it was not possible to assess symptoms leading to the diagnosis and their time of onset.
Lastly, in incidence studies only newly diagnosed cases are included and these do not necessarily coincide with newly occurring cases. Due to the wide clinical spectrum of CD, the observed variability in disease frequency between the areas may indicate variable rate of detection rather than variable incidence rate.
Our results are comparable with the highest disease frequency reported in northern European countries.36 Our estimated childhood cumulative incidence was the second highest for Europe after Sweden,2 and showed a wide geographical variation between the lowest figure of 07/10in Umbria (central Italy) and the highest of 1 4/10-3 in Piedmont (northern Italy). Consistently higher figures were also found for each area in children living in the main cities compared with those living elsewhere. A role for environmental factors, such as (1) low rate of breast feeding,3738 (2) early dietary introduction ofgluten proteins and high amount of gluten in the weaning diet,3941 (3) low incidence of gastroenteritis34 may be suggested. These factors should, however, determine an earlier onset of disease symptoms, whereas our cases had a similar mean age at diagnosis in the four areas studied. The highly variable childhood cumulative incidence observed may possibly be due to increased diagnostic awareness and availability of diagnostic facilities in the largest cities in the northern areas.
Our adult incidence rate followed the same distribution as the childhood cumulative incidence. This further supports the hypothesis of a different diagnostic awareness of the disease, since it is unlikely that environmental factors are kept constantly different throughout the areas for all birth cohorts of the patients included in our study.
In conclusion, we have shown that population based studies of the incidence rate of CD should use several information sources to avoid underestimation. By this approach, we showed that the childhood incidence rate in Italy is among the highest in Europe, and that wide variability is observed between geographical areas for both the paediatric and adulthood incidence rates. The highest rates were also observed in metropolitan areas within each area. The age at diagnosis was homogeneous throughout the areas, suggesting that geographical variability probably depends more on different disease awareness than on varying environmental factors affecting the clinical presentation of CD.
The working group of the Italian "Club del Tenue" comprises: