Article Text

Download PDFPDF

Estimation of the prevalence of diagnosed diabetes from primary care and secondary care source data: comparison of record linkage with capture-recapture analysis
  1. J N Harvey,
  2. L Craney,
  3. D Kelly
  1. University of Wales College of Medicine Wrexham Academic Unit, Maelor Hospital, Wrexham, UK
  1. Correspondence to:
 Dr J N Harvey, Diabetes Unit, Gladstone Building, Maelor Hospital, Wrexham LL13 7TD, UK;


Study objective: To compare multiple source linkage and capture-recapture analysis in determining the current age and gender specific prevalence of type 1 and type 2 diabetes in a UK white population. To assess whole population trends in diabetes prevalence and treatment by comparison with previous studies.

Design: Data were obtained from hospital sources and all 74 general practices in the study population. Analyses were carried out both by record linkage and by use of a two source capture-recapture model to correct for incomplete ascertainment.

Setting: County of Clwyd, North Wales: total population 418 200.

Main results: By record linkage the age adjusted prevalence of all diabetes was 2.04 (95% confidence intervals 2.00 to 2.09)% . Using the capture-recapture method it was 2.29 (2.24 to 2.33)%. From capture-recapture data the age adjusted prevalence of type 1 diabetes was 0.40 (0.37 to 0.43)% in men and 0.28 (0.25 to 0.30)% in women; the prevalence of type 2 was 2.03 (1.97 to 2.09)% in men and 1.67 (1.62 to 1.72)% in women. These figures represent an increase compared with previous surveys. The age specific prevalence of type 2 diabetes was greater in men in a ratio of approximately 1.5:1 and there were more patients treated by diet alone.

Conclusions: Record linkage using multiple sources underestimates the prevalence of diabetes compared with capture-recapture estimates. The results suggest the prevalence of known diabetes in the UK has approximately doubled in less than 20 years. There is an increasing preponderance of male patients and of patients treated currently with diet alone.

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Assessment of changes in the prevalence of diabetes is made difficult by virtue of the fact that many patients are never referred to hospital and that there is a substantial pool of undiagnosed cases. The incidence of type 1 diabetes has increased substantially over the past 20 years.1,2 Trends in the incidence and prevalence of type 2 diabetes are much less clear. The prevalence of type 2 diabetes is thought to be increasing and is recognised to be a major user of health care resources.3 Methods of measuring the prevalence of this (and other) chronic diseases are therefore important. Most previous surveys of the population prevalence of diabetes have been based primarily on general practice records. However, use of a single source of data rarely provides complete ascertainment of cases. In addition, the greater the size of the survey the more likely it is to involve heterogeneous primary care data collection systems. Thus recent surveys have used multiple sources of data, linking these sources electronically.4,5 Although technologically impressive, this methodology will also miss cases. There is a need to estimate the degree of completeness of the dataset. Capture-recapture analysis is a methodology borrowed from animal ecology where complete ascertainment of individual cases for prevalence estimates is also difficult. The method permits the estimation of the total number of cases if two or more sources of data are available.6 The aim of this study was to compare data linkage with capture-recapture methodology in a survey of the prevalence of diabetes in a defined area of North Wales and to assess the time trend in the prevalence of diabetes and its treatment by comparison with previous studies.


We studied the population of the (former) county of Clwyd, defined by address postcode, resident in March 1998. This comprised 418 200 people (data from the Office of National Statistics (ONS)) and was >99% white. Excluding tertiary referrals, health services to this population are provided by three district general hospitals: Wrexham Maelor, Glan Clwyd (near Rhyl) and Chester, and 74 general practices.

Each patient's diabetes was classified by type and treatment. Diabetes was diagnosed by WHO criteria.7 Type 1 diabetes was defined as diagnosis before age 40 and on insulin treatment within one year from diagnosis. Cases of intermediate glucose tolerance and gestational diabetes were excluded. Cases of secondary diabetes were included with type 2. The proportions on the various treatments were calculated using only those patients identified by name where treatment was determined (8877 patients). Patients were listed as attending hospital or primary care for their diabetes care.

Data sources

(1) Hospital Patient Administration Computer Systems (PAS): details of all patients coded as having diabetes (ICD9 code 250) were obtained electronically. (2) Hospital diabetes clinics: information from clinic attendances was obtained over one year. Type of diabetes and treatment was classified at this stage. (3) Diabetes nurses records: records were available on the majority of patients who had attended the hospital diabetes clinics over the previous 15 years. (4) Primary care: a list of diabetic patients was supplied by all 74 practices in the catchment area. Regular updates were obtained at intervals including an updated list within three months of the reference date: 31 March 1998. Information was supplied on each patient's treatment. Further enquiries were made of those on insulin to determine if they fell within the definition of type 1.

Validation of database

One year after initiating the study, further enquiries were made of all patients whose name did not appear in more than one source or whose type of diabetes and treatment was not listed. We used hospital or primary care records and the NHS Wales Administrative Register to eliminate any who did not have diabetes, had died or were not resident in the catchment area on the reference date.

Record linkage analysis

The databases were combined electronically using name, date of birth and hospital unit numbers for identification. Duplicates were eliminated by manual inspection.

Capture-recapture analysis

A two source capture-recapture model was used. Information from hospital PAS systems, hospital clinics and diabetes nurses records was combined to produce a hospital source. The lists from general practitioners were combined to generate a primary care source. The predicted number of cases (N) was calculated from the two source capture-recapture formula as given by LaPorte.6 To give the overall unadjusted prevalence, N was calculated for all patients. A 95% confidence interval for the estimate of total number of cases of diabetes predicted by capture-recapture analysis was calculated as given by Robles et al.8 For the purposes of calculating age adjusted prevalence rates, N was calculated for each five year age cohort separately for each gender and each type of diabetes and the unclassified cases. Using the numbers predicted by capture-recapture analysis, age adjusted total prevalence and prevalence rates of type 1 and type 2 diabetes were calculated by direct standardisation.

In order to investigate whether changes in the age structure of the population over time (that is, increasing numbers of elderly) could account for an increasing prevalence of diabetes we calculated prevalence rates of type 1 and type 2 diabetes age adjusted to the population of England and Wales in 1997, 1981 and 1962 (ONS mid-year estimates).

Calculations of prevalence in specific age groups was undertaken in order to make direct comparison with previously published surveys.

Assuming patient populations of the size predicted by the capture-recapture analysis, 95% confidence intervals of the age and gender specific rates (figs 1 and 2) were calculated assuming a Poisson sampling distribution. Conclusions are drawn from data with 95% confidence intervals, as recommended by Gardner and Altman.9

Figure 1

Age and gender specific prevalence of type 1 diabetes calculated by capture-recapture analysis. Error bars represent 95% confidence intervals.

Figure 2

Age and gender specific prevalence of type 2 diabetes calculated by capture-recapture analysis. Error bars represent 95% confidence intervals.


Comparison of record linkage with capture-recapture analysis

Number of patients identified from hospital sources: 5466 patients, primary care lists: 7910 patients, present on both lists: 4198. Total by linkage: 9178. Total predicted by capture-recapture analysis: 10299 (95% CI 10198 to 10400). The hospital list comprised: 22% type 1, 74% type 2; primary care source: 13% type 1, 85% type 2. The data are given by source of identification, type of diabetes and gender in table 1. The capture-recapture method assumes each member of the population is equally identifiable by a given source. If one stratifies the data by type of diabetes or gender, as in table 1, calculates the capture-recapture estimate of population size separately for each subgroup and sums the results, the same figure should be obtained as when the overall figures are used if each subgroup is equally identifiable (catchable). For the purposes of this calculation those patients with type of diabetes unclassified (3.3% of the total) were included with type 2 as this is where most are likely to belong. It can be seen that variable catchability had very little effect on the population estimates.

Table 1

Numbers of patients studied, classified according to type of diabetes and gender. The numbers of patients identified by each source (hospital, primary care or identified in both) are given with the capture-recapture estimate generated and the total in in each category obtained by source linkage. As the number of patients whose type of diabetes was unclassified was small (3.3% of the total) these have been included with the type 2 patients

Of the patients identified by name (9178), 49% were under sole general practice (GP) care.

Prevalence of diabetes

The prevalence of all diabetes by record linkage was 2.19 (2.15 to 2.24)% unadjusted and 2.04 (2.00 to 2.09)% age adjusted. The capture-recapture method gave an unadjusted prevalence of 2.46 (2.44 to 2.49)% and age adjusted prevalence of 2.29 (2.24 to 2.33)%.

The age specific prevalence (calculated from capture-recapture results) for each gender is given in fig 1 (type 1) and fig 2 (type 2 diabetes). The 95% confidence intervals indicate that in type 2 diabetes the age specific prevalence is greater in males than females at all ages where numbers are sufficient for it to be assessed: at least from age 45 onwards. In type 1 diabetes a greater male prevalence is apparent after age 35, but not in childhood.

Effect of altered age structure of the population

The prevalence of type 1 and type 2 diabetes adjusted to the age distribution of the population of England and Wales in 1997, 1981 and 1962 is given in table 2. Thus changes in age structure accounted for a 5.2% increase in the prevalence of type 2 in males since 1981 but only a 1.2% increase in females. The corresponding figures comparing 1997 with 1962 are a 16.6% increase in males and 12.8% increase in females. The observed increases when our results are compared with previous surveys (table 3) are much greater than this.

Table 2

The population prevalence (%) of type 1 and type 2 diabetes in 1998, adjusted to the age distribution of the population of England and Wales in 1997, 1981 and 1962 (95% confidence intervals in parentheses)

Table 3

Data from previously published surveys and the current survey that record the unadjusted or age adjusted prevalence of known diabetes (all types) or prevalence of previously undiagnosed diabetes in the UK white population. Prevalence figures (%) refer to all ages unless otherwise stated. Sources used to identify diabetic patients refers to (1) Population survey, (2) Hospital activity analysis, (3) Hospital clinic attendance, (4) General practice lists, (5) Prescription data, (6) Data from retinopathy screening programmes, (7) Laboratory systems. Where diabetes registers were used the sources used to compile the register are listed

Key points

  • Comparison of capture-recapture analysis and source linkage indicates that source linkage underestimates the prevalence of diabetes.

  • Comparisons with previous surveys indicates that the prevalence of diagnosed diabetes continues to increase.

  • These and other recent data show a greater preponderance of males and a greater proportion of patients treated by diet alone than in older surveys.

Type of diabetes and treatment

This was determined in 8877 patients, with 301 (3.3% of those who were identified) remaining unclassified. Of the 8877 patients, 26.5 (25.6 to 27.4)% were treated with insulin (14.8% had type 1 diabetes, 11.7% type 2 diabetes were treated with insulin), 44.8 (43.8 to 45.8)% had type 2 diabetes receiving oral agents and 28.6 (27.7 to 29.5)% were on diet alone.


In a survey of any size all data sources will be incomplete. Merging sources (linkage analysis) is, therefore, likely to produce an incomplete result no matter how many sources are used. The central problem is that the investigator has no idea exactly how incomplete. Capture-recapture analysis tackle this issue. The two source capture-recapture model assumes that (1) The two data sources (hospital and primary care) were achieved independently, (2) For a given source, any member of the population is equally likely to be identified (absence of sampling bias). As with the use of many statistical tests the assumptions are imperfectly met. A number of points can be made in support of the validity of the present analysis. Dependence of the sources can be estimated by comparing the data they provide. Here, 23% of hospital listed cases did not appear in the primary care source. The hospital PAS databases were the result of coding of hospital inpatient episodes (all specialties) over 10 years. Patients were admitted for many reasons other than their diabetes but were coded as having diabetes in addition. Some cases were first diagnosed in hospital. The hospital source identified 797 patients who were under sole GP care for their diabetes. The hospital source data reflect attendance at hospital for any reason and may therefore identify any member of the diabetic population. Thus, the hospital source data are not necessarily dependent on that patient being listed in the primary care source or being referred for hospital diabetes care. Dependence of sources can be positive or negative. Positive dependence, meaning an increased tendency for different sources to identify the same patients, is more likely here and would result in an underestimate of the total number of cases of diabetes in the target population.28

Sampling bias tends to reduce with large samples. Large samples were used here with the hospital source being 53% of the estimated total and the primary care source 77% of the estimated total. There was some heterogeneity of the samples as indicated by the greater proportion of type 2 patients in the primary care source. Stratifying the data before analysis suggests the potential error in the population estimate induced by this variable capture of cases between samples is very small. Some investigators have used a greater number of sources than we did but in many cases these have not been able to capture any patient. For example, prescription data does not identify patients treated by diet alone. Comparison between the estimated total and the total identified by merging sources gives an estimate of the completeness of the study group.28 In our study this total obtained by merging sources (linkage) was 89% of the capture-recapture estimate. This therefore is the degree to which linkage using these sources underestimates prevalence. It seems appropriate to correct for this using capture-recapture methodology. There seems to be no major reason to reject the capture-recapture analysis here which is likely to be reasonably robust in view of the large samples used. Therefore, the capture-recapture estimates are the ones used for further comparisons.

We have found a higher total prevalence of diabetes than previous surveys (table 2). Although the increasing incidence of childhood onset type 1 diabetes over the past 30 years is well documented,1,2 the relative numbers of type 1 and type 2 cases is such that our data can only be explained by the identification of considerably more type 2 cases than in previous surveys. Surveys carried out in the early 1960s found a prevalence of known diabetes of 0.6–0.7%.10,11 These early studies precede the agreed WHO criteria for the diagnosis of diabetes, but comparison of these and later surveys suggests a progressive pattern of increase. Several possible explanations can be advanced to explain this apparent increase in diabetes prevalence: (1) Better ascertainment of cases due to improved survey methods, (2) A higher proportion of the total number of cases of type 2 diabetes are now being diagnosed, (3) Changes in the age distribution of the population with greater numbers of elderly resulting in more cases, (4) Increased longevity of diagnosed cases increasing diabetes prevalence, (5) A real increase in the incidence of type 2 diabetes. These possible explanations for the measured increase are considered in turn:

(1) Better case ascertainment due to improved survey methods: Use of the capture-recapture method to correct for incomplete ascertainment contributed to the high prevalence we found. The only other study to use capture-recapture methodology was the Tayside survey.4 However, even without use of this technique the prevalence we obtained by source linkage exceeds most previous results. Some earlier surveys based on GP lists alone are likely to be incomplete. In our survey the combined primary care sources identified only 77% of the estimated total whereas the hospital and GP lists together identified 89%. The primary care lists tended not to identify children and nursing home patients and in some practices were probably centred on those who attended miniclinics. A number of surveys based on GP lists alone show considerable variation in the prevalence of diabetes in neighbouring practices suggesting differing ascertainment rates of already diagnosed cases.19,27 However, not all previous surveys used GP lists alone. The Southall survey involved door to door interviews (with an 89% response rate), the Trowbridge survey used GP and hospital information, the Oxford survey used a postal questionnaire, GP and hospital records, and in Poole, hospital and GP records were accessed. These surveys found prevalences (all standardised to the population of England and Wales in the 1981 census) of 0.97%–1.26%.14–16,21 Thus consistent results were obtained at that time (1982–1988) using a variety of methods in combination. It therefore seems unlikely that the increase since the 1980s reflects improved ascertainment of known cases. Under-ascertainment may well be a factor in the prevalences reported in the 1960s.10,11

(2) Are a higher proportion of cases now diagnosed? Surveys carried out in Islington, Melton Mowbray and Coventry used population screening to identify all cases of diabetes, those previously known and unknown.17,18,29 The largest of these was the Coventry survey of type 2 diabetes in people over age 20. In 3529 subjects of European origin either known diabetic or found by population screening, the age adjusted prevalence of type 2 diabetes in males was 1.4% known and 1.8% previously unknown making a total of 3.2%. In females it was 1.5% known and 3.1% undiagnosed making a total of 4.7%. Our figures for the age adjusted prevalence of type 2 diabetes after age 20 were 2.6% in males and 2.0% in females. Compared to these three surveys we found a higher prevalence of known diabetes but this does not exceed the total prevalence found in those surveys (table 3). Therefore some increase in the proportion diagnosed could have contributed to the apparent increase. However, when compared with the earlier RCGP and Whitehall studies, our prevalence exceeds their numbers of known plus undiagnosed cases suggesting an absolute increase has occurred.10,13

(3) Are changes in the age distribution of the population responsible for an apparent increase in diabetes prevalence? Over the period of the surveys quoted above, the age distribution of the population has changed with increasing numbers of elderly. As the prevalence of diabetes is related to age this will impact on the prevalence of diabetes. To investigate the contribution of changed age structure of the population to the apparent increase in diabetes prevalence we calculated the prevalence of diabetes in our population corrected to the age distribution of the population in 1981 and 1962. The results (table 2) show that the increase in the prevalence of type 2 diabetes resulting from change in population age structure does not nearly account for the differences in prevalence rates found comparing current data with the surveys of the 1980s and 1960s (table 3).

(4) Has increased survival of patients with diabetes contributed to the increase in diabetes prevalence? Modern treatments may well have had an effect on survival and hence prevalence but it is difficult to quantify this, particularly since it is influenced by the trend towards earlier diagnosis of type 2 diabetes.

(5) A real increase in the incidence of type 2 diabetes: None of the above alternatives provide a clear explanation thus it seems likely that the increase in prevalence reported over the past 40 years (table 3) reflects a real increase in the incidence of type 2 diabetes.

Gender differences

A striking finding in this survey is the difference in prevalence of diabetes in men and women, which becomes clear when age specific prevalence rates are examined (figs 1 and 2). In type 2 diabetes the male: female ratio is relatively constant across the whole age range where it can be examined (beyond age 45). In type 1 diabetes (arbitrarily defined with onset before age 40) the situation is less clear. In a separate study (unpublished data) we found no gender difference in patients developing diabetes before age 15. The Barts-Oxford study found no gender difference in cases incident before age 5 but a male preponderance in older children.30 Here we find no difference in prevalence of type 1 diabetes before age 25. Beyond age 35 the prevalence in males clearly exceeds that in females (fig 1) suggesting relatively more male patients among those diagnosed after childhood. Thus patients with apparent type 1 diabetes and onset after the childhood years show this gender difference in common with type 2 diabetes rather than childhood onset type 1. There was a suggestion of a male preponderance from some previous surveys from the 1980s and 1990s.4,16,19,21,26,27 Both the Southall and Coventry studies found a clearer preponderance of males in Asians than Europeans.16,29 The early surveys showed no clear male preponderance.10,11 Thus the recent increase in type 2 diabetes may have affected men more than women.

Changes in diabetes treatment

Previous studies do not distinguish type 1 and type 2 diabetes but compared with studies that record diabetes treatment in patients of all ages, we found a high proportion of patients treated by diet alone (28.6%) and a low proportion on insulin (26.5%). The Bristol survey reported similar figures,24 but with this exception we found the highest proportion on diet and lowest percentage on insulin.4,10,11,14,16,19–21,24,27,31 The Tayside survey showed 26% on diet alone and 29% on insulin.4 In the remainder the proportion on diet was 16%–24 % and on insulin 30%–39%. Thus large survey size, recent date and higher determined prevalence of diabetes are associated with the identification of a higher proportion of diet alone treated cases. The natural history of type 2 diabetes is that newly diagnosed cases are often treated with dietary measures alone initially. Thus identification of more cases treated by diet alone in our and other recent surveys may reflect increasing numbers of (new) early cases of type 2 diabetes. Alternatively, more rigorous survey methods may have achieved better ascertainment of those treated with diet alone.


We are grateful for assistance from staff of the diabetes units and medical records departments of Glan Clwyd, Chester and Wrexham Maelor Hospitals and all Clwyd General Practices.

Funding: the North Wales Diabetes Register is supported by North Wales Health Authority.

Conflicts of interest: none.


Linked Articles