Article Text

Download PDFPDF

Development and validation of a predictive algorithm for risk of dementia in the community setting
  1. Stacey Fisher1,2,3,
  2. Douglas G Manuel1,2,3,4,5,
  3. Amy T Hsu1,2,3,5,
  4. Carol Bennett1,2,
  5. Meltem Tuna1,2,
  6. Anan Bader Eddeen1,2,
  7. Yulric Sequeira1,3,
  8. Mahsa Jessri1,2,4,
  9. Monica Taljaard1,3,
  10. Geoffrey M Anderson6,7,
  11. Peter Tanuseputro1,2,5,8
  1. 1 Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, Ontario, Canada
  2. 2 Populations & Public Health, ICES, Ottawa, Ontario, Canada
  3. 3 School of Epidemiology and Public Health, University of Ottawa, Ottawa, Ontario, Canada
  4. 4 Health Analysis Division, Statistics Canada, Ottawa, Ontario, Canada
  5. 5 Centre for Individualized Health, Bruyere Research Institute, Ottawa, Ontario, Canada
  6. 6 Cardiovascular Research, ICES, Toronto, Ontario, Canada
  7. 7 Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Ontario, Canada
  8. 8 Department of Medicine, University of Ottawa, Ottawa, ON, Canada
  1. Correspondence to Dr Stacey Fisher, Dalla Lana School of Public Health, University of Toronto, Toronto, ON M5T 3M7, Canada; stacey.fisher{at}


Background Most dementia algorithms are unsuitable for population-level assessment and planning as they are designed for use in the clinical setting. A predictive risk algorithm to estimate 5-year dementia risk in the community setting was developed.

Methods The Dementia Population Risk Tool (DemPoRT) was derived using Ontario respondents to the Canadian Community Health Survey (survey years 2001 to 2012). Five-year incidence of physician-diagnosed dementia was ascertained by individual linkage to administrative healthcare databases and using a validated case ascertainment definition with follow-up to March 2017. Sex-specific proportional hazards regression models considering competing risk of death were developed using self-reported risk factors including information on socio-demographic characteristics, general and chronic health conditions, health behaviours and physical function.

Results Among 75 460 respondents included in the combined derivation and validation cohorts, there were 8448 cases of incident dementia in 348 677 person-years of follow-up (5-year cumulative incidence, men: 0.044, 95% CI: 0.042 to 0.047; women: 0.057, 95% CI: 0.055 to 0.060). The final full models each include 90 df (65 main effects and 25 interactions) and 28 predictors (8 continuous). The DemPoRT algorithm is discriminating (C-statistic in validation data: men 0.83 (95% CI: 0.81 to 0.85); women 0.83 (95% CI: 0.81 to 0.85)) and well-calibrated in a wide range of subgroups including behavioural risk exposure categories, socio-demographic groups and by diabetes and hypertension status.

Conclusions This algorithm will support the development and evaluation of population-level dementia prevention strategies, support decision-making for population health and can be used by individuals or their clinicians for individual risk assessment.

  • public health
  • epidemiology
  • dementia
  • disease modeling

Data availability statement

Data were linked using unique encoded identifiers and analysed at ICES. The data set from this study is held securely in coded form at ICES. While data sharing agreements prohibit ICES from making the data set publicly available, access may be granted to those who meet prespecified criteria for confidential access, available at The full data set creation plan and underlying analytical code are available from the authors upon request, understanding that the programmes may rely upon coding templates or macros that are unique to ICES and are therefore either inaccessible or may require modification.

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


An estimated 50 million people worldwide have dementia, which is expected to grow to over 152 million by 2050,1 putting a tremendous strain on our healthcare systems, caregivers and families. As there is no cure or disease-modifying therapies and an estimated 30% of dementia may be attributable to potentially modifiable factors,2 primary prevention has become an important component of healthcare planning and policy development and has been identified as a primary objective of many international dementia strategies.3–6 Population risk prediction algorithms can be used to describe population dementia risk, project the number of new cases over time and inform the development of prevention strategies by identifying optimal target groups for intervention and estimating the potential population health benefit.7

Most existing dementia algorithms are designed for use in the clinical setting and are not suitable for population health planning purposes. For a population risk algorithm, input variables need to be available to population health planners and decision makers, representative of the population and regularly collected so estimates can be frequently updated. Many existing dementia algorithms include neuropsychological assessment,8–13 genetic testing,10–14 neuroimaging10 or other clinical variables (eg, blood pressure or cholesterol values)14–17 that do not usually fulfil these criteria. Additionally, population algorithms should be developed on large, representative data sets and include a wide range of socio-demographic variables to allow for risk assessment and equity evaluation across population health subgroups including ethnicity, education and immigrant status. Lastly, the inclusion of modifiable risk factors facilitates the evaluation of preventative strategies. Few existing dementia algorithms are suited for population health planning.

The objective of this study was to develop and validate a risk algorithm for dementia incidence in the community setting using population health survey data. The Dementia Population Risk Tool (DemPoRT) can be used to inform the development of dementia prevention strategies and support decision-making for population health. In addition to this population health planning purpose, DemPoRT can also be used by patients or their clinicians to assess individual dementia risk.


Study design and participants

This prospective cohort study used population health survey data linked to health administrative dementia data to develop and validate a population risk algorithm, DemPoRT, for predicting 5-year dementia incidence in the community setting. Dementia was ascertained using physician billing, hospitalisation and drug dispensing data with follow-up to March 2017. Model development can be summarised in to four steps:

  1. Model derivation – creation of male and female DemPoRT risk algorithms using respondents to the 2001, 2003, 2005 and 2007/2008 Canadian Community Health Surveys (CCHS);

  2. Model validation – validation of the DemPoRT algorithms using the 2009/2010 and 2011/2012 CCHS;

  3. Final model generation – estimation of final, full DemPoRT models using the combined derivation and validation data and the same model specification as the derivation models;

  4. Derivation of the application model – creation of a parsimonious model with fewer predictors that attempts to maintain discriminatory ability, calibration and overall model performance.

The protocol for development and validation of DemPoRT was registered and published (, NCT03155815).18 We adhered to the protocol with the following exceptions: the validated dementia definition was not supplemented with dementia information from home care and long-term care data, as the data is not available prior to 2008; individuals had to be age 55 or older at the time of survey administration; follow-up was extended to March 2017 with the availability of new data; ethnicity and functional measures were recategorised due to small sample sizes; variables for multilingualism, chronic obstructive pulmonary disorder and epilepsy were added; and only the first of the multiply imputed data sets was used for model analyses, informed by previous work with this data.19 This paper adheres to the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) checklist for prediction model development.20

Survey respondents who agreed to share and link their survey interview information were eligible for study inclusion. Respondents were excluded if they were not eligible for Ontario’s universal health insurance programme, indicated a diagnosis of dementia in the CCHS or were younger than age 55 at the time of survey administration. For individuals with multiple CCHS interviews, only the earliest interview was included.

Data sources

The CCHS is a national, cross-sectional survey developed by Statistics Canada to collect data related to health determinants, health status and healthcare use. It employs a complex multistage sampling strategy to randomly select households in each region, with a target population of individuals aged 12 years and older. Over the study period, the surveys attained an average response rate of 79%. Individuals living on First Nation Reserves, institutionalised residents, full-time members of the Canadian Forces and residents of certain remote areas are excluded. Details of the survey methodology have been previously published.21 Model predictors were ascertained from self-reported responses to the CCHS. The derivation cohort was comprised of Ontario respondents to the CCHS conducted in 2001, 2003, 2005 and 2007/2008. The validation cohort consisted of Ontario CCHS respondents from 2009/2010 and 2011/2012. Temporal validation was used as it is a stronger validation approach than random creation of development and validation data sets, and because this tool will be used for prediction of future dementia risk.22

Dementia incidence was ascertained using population-based data sets housed at ICES (formerly known as the Institute for Clinical Evaluative Sciences), which have been linked at the individual level to CCHS respondents. These data sets include hospital-admission records from the Canadian Institute for Health Information Discharge Abstract Database, physician billing and diagnoses from the Ontario Health Insurance Plan physician claims database, hospital and community-based ambulatory care from the National Ambulatory Care Reporting System, and drug dispensing data from the Ontario Drug Benefit programme. Death was ascertained using Ontario Vital Statistics and the Registered Persons Database.


The primary outcome of interest was 5-year incidence of physician-diagnosed dementia, ascertained using a validated case ascertainment definition: one hospital record OR three physician claims records at least 30 days apart within a 2-year period OR a dispensing record for a cholinesterase inhibitor.23 This definition has a 79.3% sensitivity and a 99.1% specificity when validated against emergency medical record data. Survey respondents were followed from survey administration date until the earliest of dementia ascertainment, death, loss to follow-up (defined as loss of healthcare eligibility) or end of study (31 March 2017).

Statistical methods and analyses

The analysis plan was developed following guidelines by Harrell24 and Steyerberg25 and informed by the development of other algorithms by the team.19 26

Predictor identification, data cleaning, missing data and model specification methods have previously been described in the study protocol.18 Predictor variables were selected from the CCHS informed by review of existing dementia prediction algorithms,27 subject–matter expertise, our previous work developing population risk algorithms using this data19 26 and variable availability across cycles. Predictor identification and specification as well as all data cleaning and coding occurred prior to examining exposure–outcome associations. Preliminary sex-specific main effects models were fit using the prespecified predictors and degree of freedom (df) allocation. Partial association χ2 statistics for each predictor minus their df were plotted to inform df reduction, however, all initial df were retained.

Table 1 presents the 33 prespecified predictor variables and the final 29 variables in the full model, after recategorisation of ethnicity and functional measures. The models include interaction between age and all variables except socio-demographic, general health and survey-year variables, with continuous variable interactions restricted to linear terms. Survey questions used to ascertain the model variables are available at

Table 1

Predictor variables for the Dementia Population Risk Tool (DemPoRT) models

Models were estimated using Fine and Gray subdistribution hazard models,28 considering death as a competing risk. Sex-specific models were created as the effect of many dementia risk factors varies by sex, and as modelling with interaction terms would be difficult considering that age interaction is also being included for many of the predictor variables. All predictors were centred on their means for ease of recalibration in new populations and to allow for application in individuals or settings with missing values. Plots of raw and smoothed scaled Schoenfeld residuals versus time for each predictor were assessed. Overfitting was assessed in the full models using the heuristic shrinkage estimator,29 which indicated that shrinkage was unnecessary (men=0.97; women=0.98). Due to known challenges using survey weights in regression modelling,30 survey weights were not used for model derivation, however we recommend their use for population application and reporting.26 We used a step-down procedure described by Ambler31 to create a smaller, reduced model from the full model. This procedure involves removing variables that result in the smallest decrease in model R2, one variable at a time, until the Akaike Information Criterion is minimised. Two sensitivity analyses were performed using the combined data to explore model performance: (1) if the number of daily activities for which help is needed and self-rated health variables were excluded, and; (2) with age as the only predictor.

Model performance was assessed using overall measures of predictive accuracy, discrimination (how well the model is able to separate those who experience the outcome from those who do not) and calibration (agreement between predicted and observed risk). Predictive accuracy was assessed with Nagelkerke’s R2 32 and the scaled Brier score.33 Discrimination was assessed using Harrell’s concordance statistic (c-statistic).24 Overall calibration, calibration within deciles of predicted risk and within subgroups of importance to clinicians and policymakers were assessed. Calibration within subgroups was evaluated using a predefined standard, defined as less than a 20% difference between the observed and predicted risk estimates in subgroups where at least 5% of individuals developed dementia.19 26

Analyses were conducted in R V.3.134 using the riskRegression35 and Hmisc36 packages.

Ethics approval

ICES is a not-for-profit research institute and a prescribed health information custodian under section 45 of Ontario’s Personal Health Information Privacy Act. Projects conducted under section 45, by definition, do not require review by a Research Ethics Board. This project was conducted under section 45 and approved by ICES’ Privacy and Legal Office.



The 2001 to 2011/2012 CCHS include 253 189 Ontario respondents, of which 200 320 agreed to share their file, were successfully linked to administrative data at ICES, and were eligible for Ontario’s health insurance plan. Of these, 78 097 were at least 55 years of age at survey administration. After exclusion of those with prevalent dementia (n=2637), the derivation and validation cohorts included 47 739 and 27 721 respondents with 472 399 (men: 10.0 median years, IQR: 7.5 to 13.4; women: 10.2 median years, IQR 8.2 to 13.5) and 157 929 (men: 5.9 median years, IQR 4.7 to 7.0; women: 5.9 median years, IQR 4.8 to 7.0) total person-years of follow-up, respectively. In the derivation data, during the 5-year predicted time horizon of interest (220 972 person-years of follow-up), there were 6734 dementia events and 2521 deaths without dementia; in the validation data, there were 1714 incident cases of dementia and 1354 deaths without dementia over 127 705 person-years. Cumulative incidence curves for 5-year dementia ascertainment and death are in online supplemental digital content 1. The crude 5-year incidence rate of dementia was 2.0 per 1000 person-years among men and 2.7 per 1000 person-years among women, in the combined derivation and validation cohorts. Mean age at dementia ascertainment among men was 80.4 years (IQR: 75.6 to 85.6) and among women was 82.6 years (IQR 77.9 to 88.1).

Supplemental material

Characteristics of the study populations are presented in table 2, with detailed information about missing provided in online supplemental digital content 2. Mean age in the derivation cohorts was 66.0 (IQR 60.0 to 74.0) among men and 68.0 (IQR 61.0 to 76.0) among women and was similar in the validation cohorts. There was less than 1% missing data for most predictors. With the exception of variables that were not collected in all study years (multilingualism, former vs non-drinker, needing help with finances, mood disorder, epilepsy and chronic obstructive pulmonary disorder), smoking status had the most missing data (males: 11.2%; females: 11.4%), due to missing information about time since quit among former smokers.

Table 2

Baseline study characteristics of male and female derivation and validation cohorts

Model specification, development and validation

Predictor variables and df for the full and reduced models are presented in table 1. Partial correlation plots are available in online supplemental digital content 3. The final full models for men and women each include a total of 90 df (65 main effects and 25 interactions), with 28 predictors (8 continuous) and 24 interaction terms (table 1). The reduced models both have 74 df (men: 51 main and 23 interactions; women: 50 main and 24 interactions). Online supplemental digital content 4 and 5 present subdistribution HRs from the full and reduced models for men and women, respectively; an interactive online visualisation tool is in development to facilitate understanding of how the risk factors contribute to dementia risk. Model formulas and beta coefficients are available in online supplemental digital content 6 and at, respectively.

Model performance

Table 3 presents summary indicators of model performance. Both the male and female models are discriminating, indicating good ability to separate those who develop dementia from those who do not (male c-statistic: 0.83, 95% CI: 0.81 to 0.85; female c-statistic: 0.83, 95% CI: 0.81 and 0.85 in validation data). Discrimination remained stable across derivation, validation and pooled data, and in the reduced model. Within the validation data, the predicted number of dementia events somewhat differed from the observed number of events (percent difference between the 5-year observed cumulative incidence and the predicted risk, men: 4.21%; women: −10.58%), while they were very similar within the reduced models (men: −0.61%; women: −0.78%).

Figure 1

Calibration plots for the full model in validation data; mean predicted 5-year risk of dementia versus observed dementia incidence for (A) men and (B) women, by decile of predicted risk. Histograms display the relative distribution of predicted risk in the population.

Table 3

DemPoRT (Dementia Population Risk Tool) goodness of fit summary statistics of the full model in the derivation, validation and combined data, and the reduced model in the combined data*

Calibration across deciles of predicted risk is presented in figure 1. Calibration slopes in the validation data were 0.7859 among men and 0.8666 among women. Among men, the model was well-calibrated in 68 of 88 predefined policy-relevant subgroups (online supplemental digital content 7), having no more than a 20% difference in predicted versus observed risk, evaluated among subgroups where at least 5% of individuals developed dementia. In women, the model was well-calibrated in 86 of 98 subgroups (online supplemental digital content 8). Both the male and female models underestimate dementia risk among those at older ages, those who need help with daily activities and who have a history of stroke. Well-calibrated subgroups include behavioural risk exposure categories, many socio-demographic groups, stress, self-rated health and by diabetes and hypertension status.

In models where the number of activities needing help and self-rated health variables are excluded, discriminative performance is slightly reduced in men (c-statistic: 0.81) and remains consistent in women (c-statistic: 0.83). Calibration by risk deciles and in population subgroups is not notably affected. Discriminative performance in the age alone models was slightly decreased when compared with the full models (male c-statistic: 0.80; female c-statistic: 0.81). Calibration across risk deciles was degraded in both the male and female age-alone models, especially among those at high risk. Calibration was also degraded across many subgroups of importance to policymakers, including ethnicity among men, current smokers, former drinkers and those with diabetes or a history of stroke.


The Dementia Population Risk Tool is a discriminating and well-calibrated algorithm for predicting 5-year incidence of dementia among community-dwelling individuals, developed using risk factor information relevant to population health and available to population health planners and decision makers. DemPoRT is a valuable tool for population health planning and policy development as it is discriminating and well-calibrated across many subgroups of importance to clinicians and policymakers. Inclusion of health behaviour variables facilitates the development and evaluation of primary prevention strategies, and inclusion of socio-demographic variables enables evaluation of dementia burden and prevention strategies with an equity perspective.26 As it was created using routinely collected population health survey data, DemPoRT can be easily applied to newer cycles of the CCHS or to other, similar national health surveys to produce up-to-date population-level estimates of dementia incidence. All variables were centred on their means, facilitating application of the models to new settings where some of the predictor variables are not available and by allowing for re-calibration to populations with different risk factor distributions.

Population-level evaluation of disease including the development of disease projections is best performed using multivariable predictive risk algorithms;7 the DemPoRT algorithm is uniquely suited for this application. A recent review identified 39 studies describing risk algorithms to predict dementia among those in late life.27 Sixteen studies described models with good discrimination (c-statistic of 0.80 to 0.89), while two studies described models with excellent discrimination (c-statistic: 0.9113 and 0.938). Of these, all but one37 require neuropsychological (n=16) or genetic (n=4) testing; measures that are not available at the population level. Furthermore, most were developed in highly defined populations with small sample sizes and few dementia events, limiting their generalisability. Seven studies included any socio-demographic or lifestyle variables, most including only education (n=5). The only well performing algorithm identified in this review that does not require neuropsychological or genetic testing and includes socio-demographic and lifestyle variables is the Dementia Risk Score (DRS), which predicts 5-year dementia risk among individuals 60 to 79 years of age using primary care data (c-statistic: 0.84).37 It includes measures for smoking (ascertained as current vs non-current smoker), heavy alcohol use (yes vs no), depression (yes vs no), social deprivation (quintiles) and aspirin use (yes vs no) in addition to various disease states. Linear age, age squared, linear body mass index (BMI) and BMI squared were included as continuous variables. The DRS may be useful for population health planning in the UK, as it was developed using over 800 000 patients from a nationally representative primary care database. However, development on clinical data limits generalisability and it lacks some of the socio-demographic and health behaviour variables (eg, education, ethnicity, diet, physical activity) that were found to be predictive of dementia in the present study. Potential usefulness for evaluating dementia prevention strategies is also limited by categorisation of continuous health behaviour variables, and calibration across subgroups relevant to population health planning has not be assessed. Existing dementia algorithms developed specifically for use in the general population have similar limitations.38

Box 1

Example of DemPoRT (Dementia Population Risk Tool) for individual use

A 75-year-old woman

Socio-demographic factors

  • Secondary school graduate

  • Married

Health behaviours

  • Former smoker, quit 10 years ago with 20 pack-years smoking history

  • Two drinks per week

  • Five fruit and vegetables per day; one potato per day; one juice per day

  • Physical activity unknown

General health

  • Considers life a bit stressful

Functional measures

  • Needs help managing finances

Chronic conditions

  • High blood pressure

  • Diabetic

  • No heart disease, history of stroke, mood disorders, chronic obstructive pulmonary disease or epilepsy

  • Body mass index=30 kg/m2

5-year dementia risk=9.7%

DemPoRT can also be used by individual patients or their clinicians to assess dementia risk and inform decisions about lifestyle modification. See box 1 for an example of DemPoRT for individual use. As all variables are self-reported, dementia risk can be evaluated both within and outside of the clinical setting; an online calculator to facilitate this use is available at Variable centering enables risk assessment in individuals who provide only partial responses and allow for dynamic risk calculation as an individual completes the questionnaire, providing a more interactive and engaging experience. Our team has developed online health calculators as knowledge translation tools for other risk algorithms, including for cardiovascular disease, which was developed using the same health survey data as DemPoRT.19 The cardiovascular disease risk tool is integrated into the Heart and Stroke Foundation of Canada website to provide individualised risk calculations for their eHealth Risk Assessment programme. Knowledge translation of DemPoRT is facilitated by this online tool, in addition to the numerous online files.

Most predictive algorithms for dementia that include only socio-demographic and health behaviour variables perform poorly.27 We attribute the favourable performance of DemPoRT to greater model complexity including the use of more variables, interaction terms and flexible functions for continuous variables. Predictive model development generally prioritises simplicity and parsimony with the goal of producing algorithms that are easier to interpret and use. More complex algorithms do not have to be more burdensome in their application, however, particularity if they are implemented as reflexive online tools and if unbiassed calculations can be performed with partial responses. Overfitting is also often associated with increased model complexity. Our approach to model pre-specification limited this risk in the development of the DemPoRT models, which performed well in validation. While increasing model complexity may be considered unnecessary due to the generally marginal increase in overall discrimination with the addition of risk factors beyond the most predictive, added complexity has the potential to improve model discrimination for individuals and population subgroups. Complex algorithms like DemPoRT therefore have potential to support both clinical and population-based precision medicine.19

Favourable performance of DemPoRT may also be due in part to the choice of modelling technique. Given the late-life onset of dementia, it is important to consider the competing risk of death when developing a dementia risk prediction model, as failure to do so can result in risk overestimation.39 Although Cox proportional hazard modelling can be used, it has been suggested that Fine and Gray subdistribution hazard models, which model the subdistribution hazard function rather than the hazard function, are better suited to prediction purposes.28 We are only aware of one other dementia model that has used Fine and Gray regression.15


One concern with complex prediction models is an increased risk of overfitting, which can lead to the algorithm performing poorly in external populations despite performing well in internal or temporal validation. The DemPoRT model is unlikely to be overfitted for several reasons. First, the model was fully prespecified,18 which limits the potential for overfitting by avoiding bias introduced by data-driven variable selection procedures.24 DemPoRT was also developed on a very large data set with more than enough sample size for the prespecified df, and there was no evidence for overfitting in the full model. Lastly, other algorithms developed in Ontario using similar data and variable specifications have been successfully validated in external populations. A diabetes risk algorithm was validated in Manitoba, another Canadian province; discrimination was slightly improved and predicted risk closely approximated observed risk after recalibration40 and an algorithm for all-cause mortality developed using the Ontario CCHS was successfully validated using national CCHS data.26

As the case ascertainment algorithm is imperfect, and only ascertains physician-diagnosed dementia, some individuals with dementia are not being identified. Dementia is known to be generally underdiagnosed;41 42 individuals with less severe dementia symptoms, those who have significant home supports, or have poor access to healthcare may be missed or identified later in the disease trajectory. DemPoRT’s performance may therefore be overestimated, although overall performance and performance across population subgroups remains acceptable. Like the previously mentioned DRS model developed using routinely-available primary care data,37 DemPoRT also underestimated dementia risk at the oldest ages—likely due to the unavailability of variables with additional predictive ability among these adults. Other models have had success developing algorithms for older adults using neurophysiological testing.9 11 That said, DemPoRT’s underestimation can be corrected for in dementia projections using age and sex-specific recalibration techniques.43

Other population dementia models have included variables for traumatic brain injury, cholesterol, cognitively stimulating activities and fish consumption,38 which may improve model performance. Additionally, some surveys, like the Scottish Health Survey, ascertain health behaviour information using more detailed and standardised measures than the CCHS,44 however the current specification is much more common in health surveys worldwide. As DemPoRT uses self-reported predictor information ascertained at baseline, model performance may also be improved with more accurate, longitudinal predictor assessment—however, model performance was favourable regardless and the use of self-report data enhances application potential. As long as variables are ascertained similarly and reporting patterns do not change, model performance in application is unlikely to be affected by the use of self-reported data.


DemPoRT is the first multivariable predictive risk algorithm for dementia designed specifically for use by population health planners, with favourable performance despite using only self-reported population-level data and without the use of neuropsychological or genetic testing. It is discriminating and able to predict dementia risk across a range of health profiles. DemPoRT will be used to answer key policy questions with respect to the future burden of dementia in Canada and will support the development and evaluation of population-level dementia prevention strategies.

What is already known on this subject

  • Most predictive algorithms for dementia risk have been designed for use in the clinical context, and none have been developed for population health planning purposes. Additionally, most algorithms for dementia that include only socio-demographic and health behaviour variables perform poorly.

What this study adds

  • The Dementia Population Risk Tool (DemPoRT) is discriminating and well-calibrated across a wide range of population subgroups despite using only self-reported risk factors. Favourable performance is attributed to modelling technique and greater model complexity including the use of more variables, interaction terms and flexible functions for continuous variables. DemPoRT is the first multivariable risk prediction algorithm for dementia designed for population use. This algorithm will be used to produce improved estimates of future dementia burden, identify high risk population subgroups and inform the development of dementia prevention strategies. It can also be patients and their clinicians to assess individual dementia risk.

Data availability statement

Data were linked using unique encoded identifiers and analysed at ICES. The data set from this study is held securely in coded form at ICES. While data sharing agreements prohibit ICES from making the data set publicly available, access may be granted to those who meet prespecified criteria for confidential access, available at The full data set creation plan and underlying analytical code are available from the authors upon request, understanding that the programmes may rely upon coding templates or macros that are unique to ICES and are therefore either inaccessible or may require modification.

Ethics statements


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Twitter @StaceyFisher_, @amytmhsu

  • Contributors SF was responsible for the study design, protocol development, data analysis, interpretation of the results and drafting and revision of the manuscript. DGM and PT were responsible for conception of the project, grant application and contributed to the study design, protocol development and result interpretation. MTu, ABE and MTa contributed to the study design, protocol development and result interpretation and provided data/statistical support. CB, MJ and ATH contributed to the design of the study, protocol development and result interpretation. YS provided statistical support and is primarily responsible for the online web calculator, visualisation tool and application programming interface. GA contributed to result interpretation. All authors provided critical reviews of the manuscript.

  • Funding The results reported herein correspond to specific aims of grant MOP 142237 to Douglas G Manuel from the Canadian Institutes of Health Research (CIHR). This study was supported by ICES, formerly known as the Institute of Clinical Evaluative Sciences, which is funded by an annual grant from the Ontario Ministry of Health and Long-Term Care (MOHLTC). The opinions, results and conclusions reported in this paper are those of the authors and are independent from the funding sources. No endorsement by CIHR, ICES or the Ontario MOHLTC is intended or should be inferred.

  • Competing interests None declared.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.