The cognitive deficits observed in mild cognitive impairment (MCI) signal abnormal changes in neural structure and function representative of possible prodromal markers of Alzheimer’s disease (AD) or other significant neurodegenerative disorder (Albert et al. 2011; Petersen et al. 2001, 2009; Saunders and Summers 2011). These deficits exceed the age-related changes in cognitive efficiency, attention, memory, and executive functions anticipated at the fifth to sixth decade of life and may progressively accelerate into more significant cognitive declines by the seventh to eighth decades (Cabeza et al. 2017; Rog and Fink 2013; Salthouse 1996, 2011; Schaie and Willis 2010). As there are no effective medical or pharmacological intervention for the treatment of MCI or AD, other interventions such as compensatory cognitive strategies, “brain-games”, and other lifestyle changes (i.e. nutrition, exercise, etc.) are aggressively being sought to mitigate or slow illness progression (Cai and Abrahamson 2016; Curlik and Shors 2013; Kivipelto et al. 2013; Lehert et al. 2015; Simons et al. 2016; Smith et al. 2010). This may be especially important as cognitive training in MCI has been associated with increased activation in the hippocampus (Hampstead et al. 2012b; Rosen et al. 2011), right inferior parietal lobe (Belleville et al. 2011), frontoparietal network (Hampstead et al. 2011), occipito-temporal areas (Onur et al. 2016), and implicated in other processes (Barban et al. 2017; Ciarmiello et al. 2015; Maffei et al. 2017). As such, identifying the cognitive interventions effective in MCI may, in turn, aid in targeting the neural networks with greater specificity at prodromal stages to alter illness trajectory away from more serious cognitive decline (Kim and Kim 2014; Shatenstein and Bargerger-Gateau 2015; Sitzer et al. 2006).

In an effort to distill the mechanisms associated with age-related changes and pathognomonic processes across the lifespan, The Scaffolding Theory of Aging and Cognition – Revised (STAC-r) suggests cognitive training may activate compensatory neural processes or “scaffolding” to provide computational support to primary networks or newly established task networks when new skills are acquired (Reuter-Lorenz and Lustig 2017; Reuter-Lorenz and Park 2014). It has also been suggested that training may activate pre-existing cognitive reserves (Stern 2012; Wirth et al. 2014) or prompt hemispheric recruitment to meet processing demands (Hemispheric Asymmetry Reduction in Older Age [HAROLD], Cabeza 2002). The manner in which neural mechanisms are utilized may shift across the lifespan as older adults demonstrate age-related decreases in occipito-temporal activity coupled with increases in prefrontal cortex processing (Posterior-Anterior-Shift with Aging [PASA], Davis et al. 2008). Compensatory mechanisms may, however, reach a ceiling and become ineffective under high load or high demand circumstances (Compensation Related Utilization of Neural Circuits Hypothesis [CRUNCH]; Reuter-Lorenz and Cappel 2008) or become compromised by neuropathological processes. Cognitive training may stimulate pre-existing neural reserves or recruit neural circuitry as “compensatory scaffolding” prompting neuroplastic reorganization to meet task demands, in the context of adaptive factors and divergent trajectories of decline (Hong et al. 2015; van Paasschen et al. 2009).

To this end, multiple types of interventions have been employed in MCI including restorative training, compensatory-based strategies (Bahar-Fuchs et al. 2013; Gates and Valenzuela 2010; Kinsella et al. 2009; Martin et al. 2011; Simon et al. 2012), cognitive stimulation and multicomponent or multimodal forms of intervention. Please see Table S1 in the supplementary material for additional information regarding terms and definitions. However, existing reviews and prior meta-analyses have reported varying findings concerning the benefits of cognitive training. Several reviews report there to be a benefit from cognitive strategies (Coyle et al. 2015; Faucounau et al. 2010; Hill et al. 2017; Jean et al. 2010b; Li et al. 2011; Reijnders et al. 2013; Simon et al. 2012) and other analyses have found little or no advantage (Belleville 2008; Gates et al. 2011; Huckans et al. 2013; Kurz et al. 2011; Martin et al. 2011; Stott and Spector 2011; Vidovich and Almeida 2011; Zehnder et al. 2009).

While we acknowledge and appreciate these prior reviews have been conducted, to the best of our knowledge there have been no meta-analyses which examined cognitive interventions in randomized clinical trials (RCT) across neuropsychological domains exclusively in the MCI population. An updated examination of cognitive interventions is needed given improvements in the interventions and strategies applied, increased use of neuropsychological measures pre- and post-intervention, as well as identification of moderating variables which may influence intervention outcomes (e.g. MCI diagnosis, duration of training, etc.). There has been an increased use of cognitive interventions, generally (Craik et al. 2007; Mahncke et al. 2006; Purath et al. 2014; Stuss et al. 2007; Tardif and Simard 2011; Uchida and Kawashima 2008; Willis and Caskie 2013), as well as proliferation of evidence connecting training to neural substrates and neuroplastic processes (Cabeza et al. 2017; Park and Festini 2017; Reuter-Lorenz and Lustig 2017; Toepper 2017). Of particular significance has been the refinement of diagnostic criteria used to define MCI to recruit subjects and delineate treatment groups (Albert et al. 2011; Petersen et al. 2001, 2009; Winbald et al. 2004). This refinement has resulted in a better selection process and, secondarily, led to an increase in the quality of outcomes observed. However, it has taken time for these changes to find their way into published clinical trials.

A review of current works is also needed due to broadened use of neuropsychological instruments to obtain pre-intervention baseline scores and post-treatment outcomes. Improved technical precision through the use of more robust neuropsychological outcome measures has improved our understanding of the effectiveness of interventions applied (Ellis et al. 2009, 2010; Ibanez et al. 2014; Mitchell 2009) and aided with control of potential confounds due to test-retest and repeated measure experimental design. In addition, while multiple neural centers may be involved when completing test instruments (Matias-Guiu et al. 2017), objective neuropsychological measures have been shown to be sensitive to different stages of illness in neurodegenerative disorder increasing their potential role and discriminability in study execution (Han et al. 2017).

Given these changes and advancements, the present meta-analysis was conducted to examine the efficacy of cognitive interventions on neuropsychological test performance in individuals diagnosed with MCI versus MCI controls conducted in RCTs. We explored the strategies used, in both general intervention approaches and specific forms of cognitive training, in an effort to distill the cognitive tasks effective in MCI. We sought to determine (i) what were the changes in cognition from baseline to outcome after the intervention was applied (ii) what were the common characteristics of interventions found to be effective across studies, (iii) what are specific interventions that may be of benefit to individuals diagnosed with MCI in the clinical setting, (iv) what are the key structural factors needed to set-up an effective MCI intervention program (e.g. duration, frequency, homework, etc.), and (v) what inferences may be made regarding interventions applied and the neural processes involved in MCI?

Provisionally, based on STAC-r and neurocognitive models, we anticipated three possible data-patterns associated with cognitive training could emerge (i) primary network engagement, (ii) compensatory scaffolding activation, or (iii) loss of measurable response. More specifically, evidence of primary network engagement would be suggested by moderate – large effect sizes on domain-specific outcomes after domain-targeted training (direct effects). In this scenario, training would facilitate primary network engagement by creating new primary network paths (scaffolding) to meet task demands without recruiting alternate networks. We would expect interventions which are challenging, novel, and deeply engaging to result in a more ‘youthful’ performance due to greater efficiency and less reliance on compensatory processes (Reuter-Lorenz and Park 2014; Vermeij et al. 2017). Restorative strategies which target specific cognitive functions may facilitate this process as restorative interventions aim to return deficits to premorbid levels (i.e. errorless learning), although other forms of training may also improve efficiency to a lesser degree (i.e. compensatory strategies).

Primary network engagement would also be suggested by moderate to large effect sizes with multicomponent or multidomain interventions. In this instance, training may act on primary networks and complementary processes simultaneously requiring integration of complex skillsets, including lifestyle changes, mitigating structural and functional declines by enhancing processing in specific centers and decreasing the neural burden on other areas (Barban et al. 2017; Hosseini et al. 2014; Suo et al. 2016). Multidomain approaches may also target multiple neural regions for a more enriched and complex neural challenge (Ballesteros et al. 2015; Li et al. 2017). Moreover, multidomain approaches may offer greater utility as most cognitive skills are not unitary, single-domain processes but involve interrelated cognitive functions across several areas which, after training, prompt inclusion of additional networks (Belleville et al. 2011). As such, small to moderate effects may also be demonstrated in non-targeted domains (transfer effects). In addition, cognitively stimulating activities may aid in engaging cognitive reserves to slow decline or reduce the risk of greater pathology (Ciarmiello et al. 2015; De Marco et al. 2016; Herholz et al. 2013; Stern 2012). There may be protective factors or moderator effects on outcome measures such as level of education, occupation, and intelligence as well (Hall et al. 2009).

A second data-pattern, compensatory activation, would be suggested by small effects sizes associated with training in targeted domains, no effects on non-targeted domains (absence of transfer effects), and small to moderate effect sizes associated with multicomponent or multidomain forms of intervention. In this instance, training would recruit alternate networks to meet task demands. However, smaller effect sizes would be anticipated as primary networks would be unable to manage the load due to loss of functionality, decreased efficiency, and an increase in dedifferentiation (decrease in specialization). As such, multimodal forms of intervention may be more efficacious as several strategies can support primary networks and recruited networks simultaneously.

Finally, as a third data-pattern, the absence of any training effect would suggest individuals with MCI have lost sufficient neurocognitive plasticity to engage primary networks and are unable to recruit alternate compensatory mechanisms to meet task demands.

Methods

The search process and meta-analyses performed followed guidelines outlined by the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA; Moher et al. 2009) using a PICOS approach (Participants, Interventions, Control Outcomes, and Study Design). Pursuant to the recommendation by Gates and March (2016), the PRISMA checklist items are addressed in the sections below. Please see Table S2 in the supplemental materials for the PRIMSA items cross-referenced by page.

Protocol Registration

The research methodology and protocol for this meta-analysis was not registered prior to conducting the review. All methods and procedures regarding the search and analysis are described in the paper with additional information provided in the supplemental materials.

Study Eligibility – Inclusion & Exclusion Criteria

This review focused on studies published from January 1995 to June 2017 which (i) selected subjects based on established-MCI criteria (Albert et al. 2011; Petersen et al. 2009; Winbald et al. 2004), (ii) performed a RCT in an outpatient setting, (iii) compared cognitive training versus controls (active or passive), and (iv) reported outcomes based on objective neuropsychological measures. The start date of January 1995 was chosen as beginning point as a cursory search for studies prior to this time resulted in no relevant works and it was believed studies published prior to 1995 would not have recruited subjects according to current diagnostic criteria. The definition used for cognitive intervention was any strategy or skill which sought to improve mental processes of attention and concentration, speed of information processing, memory, or executive functions, similar to the guidelines offered by Gates and Valenzuela (2010). We did not define cognitive intervention solely in classical forms such as restorative, compensatory, etcetera, in an effort to obtain a broadly inclusive dataset, although we also recorded the strategy type used according to cognitive training categorization suggested by Simon et al. (2012) to examine specific methods of training. For the purposes of this review, we have adopted the following terminology (i) intervention as a broad-based idiom to refer, generally, to any effort employed, (ii) cognitive stimulation to mean nonspecific and leisure forms of activities, (iii) cognitive training to denote either compensatory or restorative forms of training, and, (iv) multicomponent forms of intervention to mean the combination of several approaches used together. Please see Table S1 for additional information regarding terms and definitions. In addition, we selected only those studies which were RCTs with a clearly defined patient sample population according to MCI criteria (Albert et al. 2011; Petersen et al. 2001; Petersen 2004, Petersen et al. 2009; Winbald et al. 2004) or analogous definition using an algorithm of 1.5 standard deviations below the mean (Vidovich et al. 2015) on established neuropsychological instruments, that is, Consortium of Established Registry for Alzheimer’s Disease (CERAD; Fillenbaum et al. 2008). Studies performed in a day-treatment, institutional, or group-residential setting were not considered due to concerns regarding the influence of non-controlled effects and other potential confounds.

Information Sources – Databases Searched

Our review was conducted through the OVID-MEDLINE search engine using a collective of the source databases MEDLINE-R, PubMed, Healthstar, Global Health, PSYCH-INFO, and Health and Psychological Instruments. There were no other data sources, information used from informal contact, or data obtained through other methods of communication (e.g., email, conference, etc.).

Literature Search Parameters

The primary search parameters included (i) terms representative of the MCI diagnostic category (mild cognitive impairment, MCI, pre-Alzheimer’s disease, early cognitive decline, early onset Alzheimer’s disease, and preclinical Alzheimer’s disease), (ii) a descriptor of the intervention or training conducted (intervention, training, stimulation, rehabilitation, or treatment), (iii) RCT, (iv) limit to “1995-Current”, and (v) limit to human. The specific Boolean search strategy statements and result counts are provided in the supplemental materials as Table S3.

Study Screening & Selection

The process by which studies were identified, screened, considered for eligibility and included is illustrated in Fig. 1. The first two authors (DSS & JM) conducted the literature search and screened potential studies following the search criteria described above. This was done independently in two search waves with periodic confirmation of the eligibility criteria and progress made in study selection.

Fig. 1
figure 1

Literature review flow diagram

Data Collection Process and Data Items

The effects of training were evaluated through subjects’ performance on outcome measures, defined as scores obtained on neuropsychological test instruments. Cognitive domains were delineated into generally accepted pre-established categories of mental status and general cognition, working memory or attention, speed of information processing, language, visual-spatial ability, memory (verbal and non-verbal), and executive functions. Multiple test instruments and test versions were administered specific to study location sensitive to language and culture of origin. We defined mental status and general cognition in broad terms and adopted scores from abbreviated measures as well as summary scores from larger instruments. Measures assessing learning and memory were grouped into (i) general memory overall, then further subdivided into (ii) verbal and (iii) non-verbal categories. Verbal memory was further split into (i) list learning and (ii) story recall, while non-verbal memory measures were not subdivided. A summary of neuropsychological instruments listed by cognitive domain are presented in the supplemental materials as Table S5. Variables of IQ, age, education, and treatment duration (hours) were extracted and coded as continuous variables. In addition, mode of training, type of training, domain targeted (intervention content), type of control, MCI type, period to follow-up assessment, and control for repeat administration were delineated as categorical variables and dummy coded.

Risk of Bias – Individual Studies

The risk of bias in individual studies and quality of each study was independently weighted by two of the authors (DSS & JM) using the NIH Quality of Assessment of Controlled Intervention Studies Scale (2017). The NIH instrument is a 14-item scale with which publish works may be rated in terms of randomization, blinded status of the participants, drop-out rates, etc. (NIH; www.nhlbi.nih.gov). Studies were not explored for determination of bias secondary to funding sources or other form of bias such as institution or affiliation.

Summary Measures

To quantify the possible benefit of training, we sought to determine the difference in post-training scores between MCI intervention groups versus MCI controls (waitlisted, non-trained or active control subjects). As such, means and standard deviations from test instruments were extracted from each study as primary summary measures to calculate summary statistics, effect sizes (weighted and un-weighted), and 95% confidence intervals (CI). When means and standard deviations were not reported directly, p-values, t-values, F-values, or confidence interval data were extracted to calculate the mean and standard deviation statistics for intervention and control groups. Data from outcome measures were extracted from the time-point closest to the conclusion of training, if more than one wave of data collection was conducted post-training. Differences between means were calculated according to Hedges’ g metric. The overall summary effect size, forest plots as well as individual effect sizes within specific cognitive domains were examined. The values used to interpret effect size were in keeping with established guidelines such that a small effect size was defined as 0.20 or small (range 0–0.20); moderate = 0.50 (range 0.30–0.70); and large = 0.80 (range > = 0.80; Cohen 1988; Durlack 2009). Given the likely heterogeneity resulting from variability of training approaches, range of outcome measures, and differing methodological procedures across studies, a random-effects model was assumed for all analyses (DerSimonian and Laird 1986). A prediction interval was calculated at 95% confidence to approximate the range of effect which might be anticipated under similar intervention conditions with the same outcome measure(s) reported based on the procedure recommended by Borenstein et al. (2016) and performed with a Microsoft excel spreadsheet graciously provided by the authors.

Synthesis of Results & Measures of Inconsistency

Means and standard deviations from neuropsychological measures for each study were entered for analysis. For studies with more than one outcome measure, a combined outcome, or ‘synthetic variable’, was computed through combining all test results reported from the study to produce a single mean difference, in accordance with the procedure recommended by Borenstein et al. (2009). As such, each study was represented by one score and contributed only one effect size in the meta-analysis regardless of the number of outcomes administered in the study. This approach was taken to restrict artificial inflation, potential interdependence of observations, and to avoid error due to redundancy. To ensure this, we examined summary effects and measures of dispersion by conducting sensitivity analyses (M. Borenstein, personal communication, September 2017) for select meta-analyses assuming various levels of correlation between outcome measures (0.0, 0.20, 0.40, 0.60, 0.80, and 1.0). Meta-analyses were conducted with the Comprehensive Meta-Analysis 3.3.070 software (CMA) with some adjunct exploratory analysis performed with Meta-Easy Add-In for Microsoft Excel Program (Kontopantelis and Reeves 2009; Kontopantelis and Reeves 2010).

Meta-analyses were also run by cognitive domain as this was viewed to be more consistent with the construct validity and scope of outcome instruments administered (Demakis, 2006). As such, we examined summary effects by (i) cognitive domain (irrespective of intervention type), (ii) type of cognitive training, as well as (iii) the focus of interventions on specific outcomes associated with the targeted domain (i.e. effects of memory training specifically on memory measures). A minimum of five studies was used as criterion for analysis due to concerns for over-interpretation. While a minimum of two studies may be used as inclusion criteria and may be appropriate for some meta-analyses, particularly when outcome data is homogeneous or very specific (Valentine et al. 2010), we limited our meta-analysis to five studies given the range of outcome measures administered and due to cautions performing meta-analyses with less than five studies (Borenstein et al. 2009). We also tested significance and calculated confidence intervals for Hedges’ g overall effect for meta-analytic trials of twenty or less (<= 20) using the Hartung-Knapp-Sidik-Jonkman (HKSJ) method as this has been demonstrated to yield more adequate error rates, especially when the number of studies is small (IntHout et al. 2014).

With respect to verbal and non-verbal memory, we limited data to the delay aspect of outcome measures omitting learning trials, trial totals, immediate recall indices, and measures associated with prospective memory. Because several studies reported data for verbal and non-verbal recall as indices or composite scores, we ran meta-analyses for (i) all memory measures combined, as well as (ii) verbal memory and (iii) non-verbal memory subdomains separately. Regarding verbal memory in particular, scores from list recall and story recall measures were examined together to maximize data analysis. With respect to mental status and general cognition, we examined the effect of interventions on summary measures as subjects and patients are routinely evaluated with these screening-tests and general instruments (i.e. MMSE, MOCA, DRS, ADAS-Cog, RBANS total, etc.). When examining the effects of interventions directed at a specific domain, we categorized studies based on the intervention and content described and limited the meta-analysis to outcomes associated with that domain.

Risk of Bias - Publication Bias

To examine the potential for publication bias, we performed several analyses including (i) visual inspection of funnel plot symmetry, plotting the standard difference in means to standard error (Rosenthal 1979; Sterne et al. 2000, 2011), (ii) calculation of Egger’s regression intercept (Egger et al. 1997), and (iii) Duval and Tweedie’s trim-and-fill method for imputed estimates and adjusted values (Duval and Tweedie 2000). The test for heterogeneity was based on Cochran’s Q and τ2 statistic. The I 2 value was also calculated, although this was reported for descriptive purposes only as this would not be regarded to be an adequate measure of inconsistency and has limited generalizability (Borenstein et al. 2016). Values of I 2 were characterized as small when I 2 = 25% (<= 25%), moderate for I 2 = 50% (26–74%), and large as I 2 = 75% (> = 75%; Higgins et al. 2003). No study was removed when examining potential contributions to heterogeneity nor did we exclude and re-run a meta-analysis as this was regarded as altering the study eligibility criteria previously established for our search (Higgins 2008).

Additional Analyses – Moderator Variables & Meta-Regression

A subgroup meta-analysis and hierarchical meta-regression were conducted to examine the possible influence of moderator variables on outcome measures as well as contributions to heterogeneity (Borenstein et al. 2009; Borenstein and Higgins 2013; Rutter and Gatsonis 2001; Sedgewick 2013; Thompson and Higgins 2002). This was performed with MCI diagnosis, type of training, mode of treatment (group, individual, computer), primary focus of the intervention (memory, multicomponent, etc.), type of control group (passive or active), time of post-intervention follow-up assessment, and adjustment for repeat administration. These were evaluated using categorical variables dummy coded in the manner outlined above. All studies and outcome measures were included in the analysis, with the exception of outcomes which would have introduced duplicate data (i.e. MOCA – List Recall, MMSE – List Recall, MOCA – Clock Drawing, etc.).

A hierarchical (incremental) meta-regression was also used to explore the incremental contribution of moderators in a systematic manner. Given concerns regarding small number of studies in meta-regression, we limited the number of moderators used to a ratio of one covariate for every ten studies (Borenstein et al. 2009). With this criterion, the number of covariates was limited to two and entered in the order of duration of training and type of cognitive training. We also conducted an additional post hoc meta-regression in accordance with the recommendation of Fu et al. (2010) in which six to ten studies for continuous variables and four studies for categorical variables may be acceptable when effect sizes are moderate or large.

Results

Studies Selected

The inclusion and exclusion criteria, search strategy, and paper selection process we followed resulted in a set of 1199 studies for consideration. From this, 1102 were removed for various reasons, including, not meeting MCI criteria, containing mixed diagnostic categories (i.e. MCI and mild AD), not an RCT, lacking adequate controls, improper design or did not have the requisite study arms, did not contain data or not reporting data in a format suitable for extraction, not describing the key elements necessary for consideration in sufficiently clear manner for inclusion (i.e. did not explicitly state MCI definition for subject selection), or similar complication. An additional 71 studies were excluded as the interventions used would not be considered to be cognitive in nature (i.e. pharmaceutical, physical exercise, dietary, etc.). Please see Fig. 1 for a flow diagram of the paper selection process and Table S3 for the Boolean search strategy, terms and results of each step taken.

Study Characteristics

In all, a total of 26 articles met inclusion criteria for meta-analyses. Please see Table 1 for a detailed overview of all studies examined. The total number of studies meeting criteria was considered acceptable and sufficient to proceed (k = 26) as this was well above the median number of six studies per meta-analysis reported in a well-known repository of metanalytic reviews, the Cochrane Database of Systematic Reviews, and exceeded 90% of the analyses (k = 10) conducted in the mental health and behavioral condition category (Davey et al. 2011). The majority of studies included in our cohort were published recently. 92% of the studies included (24/26) were published within the past 7 years (2010–2017) and 8% of the studies (2/26) were published prior to 2010. The total number of participants who received training was 876 (pooled) with a mean of 33.69 subjects per study (SD = 35.07; Range 6–145). The diagnosis of study participants was comprised of amnestic MCI – single domain (19.23%), amnestic – multiple domain MCI (26.92%) as well as all MCI subtypes combined (53.85%). Samples were drawn from one multicenter group of nations (Italy, Greece, Norway, & Spain) and eleven individual countries, Argentina (n = 1), Australia (n = 5), Brazil (n = 1), Canada (n = 2), France (n = 1), Germany (n = 2), Greece (n = 1), Hong Kong (n = 1), Italy (n = 4), Korea (n = 1), and the United States (n = 6). A summary of the age, education, mental status, number of participants, treatment duration, and span of involvement is presented in Table S6.

Table 1 Summary of findings: cognitive intervention impact on participants with mild cognitive impairment (MCI)

In general terms, the types of interventions employed were multidomain including lifestyle elements such as exercise and social support (7/26; 26.92%), general cognitive interventions (7/26; 26.92%), specific mnemonic memory techniques (3/26; 11.54%), computer-based interventions (7/26; 26.92%), and highly developed specialized tasks (2/26; 7.69%). Categorizing interventions with more structured definitions found studies employed cognitive stimulation = 0/26 (0%), restorative training = 8/26 (30.77%), compensatory training = 3/26 (11.54%), and multicomponent approaches = 15/26 (57.69%). The primary content of interventions or domains targeted by treatment generally focused on working memory (7.69%), speed of information processing (7.69%), memory (34.62%), and multidomain training (including components associated with lifestyle and socialization, 50.00%), although there was some overlap in intervention content, strategies applied, and targeted domains (i.e. Jeong et al. 2016). The modes of training conducted were in group format (46.15%), individual plus dyad training (15.38%), and computer based programs (38.46%). The majority of studies completed post-training evaluations within two weeks or less (80.77%; 21/26). Five studies performed follow-up assessments ranging from four to twenty-six weeks (Jean et al. 2010a; Hampstead et al. 2012a; Valdes et al. 2012; Rojas et al. 2013, and Tsolaki et al. 2011, respectively). Treatment duration was roughly divided evenly at either short (8 weeks or less, 46.15%) or long duration (greater than 8 weeks, 53.85%). The majority of control groups were passive (received no training), waitlisted, or provided standard of care (57.69%) while the remaining engaged in an active, non-trained program (42.31%). Thirty-one percent (30.77%; 8/26) attempted to control for potential confound of practice effects through the use alternate or parallel versions of test instruments. The remaining studies (69.23%; 18/26) did not appear to account for practice effects or did not report this in their study. Other variables such as IQ, time since diagnoses, date of onset, etc. could not be considered for extraction due to lack of data across studies.

Risk of Bias – Individual Studies

While point estimate effect sizes were ultimately used to determine training effect, an NIH derived total score and standard score (z) were calculated for each study. This is presented in Table 1.

Synthesis of Results

A series of meta-analyses were conducted to investigate the effects of training overall (all measures combined) as well as by cognitive domain, training type, and domain targeted (intervention content). Please see Table 2 for an overview of point estimates and summary statistics for each element of interventions examined. Meta-regression and post hoc analyses were also performed to explore possible influence of moderator variables.

Table 2 Summary of meta-analyses: By domain, type of training and intervention content
  1. 1.

    Intervention Effects – Overall: The summary effect of interventions overall demonstrated a significant, moderate effect on cognition (Hedges’ g observed = 0.454; 95% CI [0.156, 0.753]; Z = 2.983; p = 0.003). The HKSJ calculation to adjust for a small number of studies was not conducted as the number of studies exceeded the upper limit of twenty (k = 26). Heterogeneity was significant and large; an anticipated finding in keeping with the diverse range of training employed and number of outcomes administered (Q = 205.409; df = 25; p = 0.000; I 2 = 87.829%; τ2 = 0.484). Visual inspection of the funnel plot was somewhat asymmetrical with three outliers observed. There were no adjustments for possible publication bias after calculation of Duval and Tweedie’s trim-and-fill method. Egger’s regression intercept was suggestive of small-study effects (Intercept = 3.036; t = 2.60; two-tailed p = 0.016). Please see Fig. 2 for a forest plot of the effects and overall summary effect and Fig. S2 for a funnel plot of the standard error (SE) by Hedges’ g. Please see Table S7 in the supplemental materials for a list of effect sizes, confidence intervals, and p-values for outcome measures used in each study (Note: The effect sizes reported in this table assume independence and are for general information only). Sensitivity analysis examining effect sizes and measures of dispersion at various levels of correlation between test instruments demonstrated adequate mean effects and weights consistent with the values used a conservative estimate of combined outcome (please see Table S8).

    Fig. 2
    figure 2

    Effects of cognitive interventions on all outcome measures. Test for heterogeneity Q = 205.409, df = 25; p = 0.000; I 2 = 87.829; τ 2 = 0.484

  2. 2.

    Intervention Effects – By Cognitive Domain: Please see Table 3 for a breakdown by domain including the study, outcome measures, effect sizes, indices of heterogeneity, and prediction intervals.

    Table 3 Effect sizes by cognitive domain: Effects of outcome measures, confidence intervals and prediction intervals
    1. a.

      Mental Status & General Cognition: A total of sixteen studies reported scores for mental status and general cognition. The effect of training on MCI compared to controls was small and significant (Hedges’ g observed = 0.216; 95% CI [0.076, 0.356]; Z = 3.015; p = 0.003). The HKSJ calculation was also significant (HKSJ point estimate adjustment SMD = 0.218; 95% CI [0.070, 0.366]; t = 3.146; df = 15; p = 0.007). Test for heterogeneity was not significant (Q = 19.462; df = 15; p = 0.194; I 2 = 22.928%; τ2 = 0.017). Visual inspection of the funnel plot was asymmetrical and Duval and Tweedie’s trim-and-fill method (to the left) trimmed seven studies generating a very small adjusted point estimate reflective of probable publication bias (Adjusted point estimate = 0.073; 95% CI [−0.087, 0.233]; Q = 44.231). Egger’s regression intercept was not significant for small-study effects (Intercept = 1.265; t = 2.073; p = 0.057; two-tailed). The summary effects and forest plot for mental status and general cognition is presented in Fig. 3a and funnel plot is provided in the supplemental materials as Fig. S3a.

      Fig. 3
      figure 3figure 3figure 3figure 3

      a Meta-analysis of interventions on mental status and general cognition. HKSJ point estimate adjustment SMD = 0.218; t = 3.146; p = 0.007. Test for heterogeneity Q = 19.462; df = 15; p = 0.194; I 2 = 22.928; τ 2 = 0.017. b Meta-analysis of interventions on working memory/attention. HKSJ point estimate adjustment SMD = 0.627; t = 3.062; p = 0.011. Test for heterogeneity Q = 43.068; df = 11; p <  0.001; I 2 = 74.459; τ 2 = 0.227. c Meta-analysis of interventions on speed of information processing. HKSJ point estimate adjustment SMD = −0.441; t = −1.759; p = 0.139. Test for heterogeneity Q = 58.656; df = 5; p = 0.000; I 2 = 91.476; τ 2 = 0.701. d Meta-analysis of interventions on language. HKSJ point estimate adjustment SMD = 0.519; t = 3.730; p = 0.010. Test for heterogeneity Q = 13.457; df = 6; p <  0.001; I 2 = 55.141; τ 2 = 0.072. e Meta-analysis of interventions on memory: verbal + non-verbal combined. HKSJ point estimate adjustment SMD = 0.675; t = 3.823; p = 0.001. Test for heterogeneity Q = 90.898; df = 19; p <  0.001; I 2 = 79.098; τ 2 = 0.277. f Meta-analysis of interventions on memory: verbal. HKSJ point estimate adjustment SMD = 0.775; t = 2.833; p = 0.013. Test for heterogeneity Q = 95.811; df = 14; p <  0.001; I 2 = 85.388; τ 2 = 0.421. g Meta-analysis of interventions on memory: non-verbal. HKSJ point estimate adjustment SMD = 0.593; t = 2.705; p = 0.054. Test for heterogeneity Q = 5.082; df = 4; p = 0.279; I 2 = 21.292; τ 2 = 0.047. h Meta-analysis of interventions on executive functions. HKSJ point estimate adjustment SMD = 0.585; t = 1.505; p = 0.158. Test for heterogeneity Q = 126.404; df = 12; p < 0.001; I 2 = 90.507; τ 2 = 0.669

    2. b.

      Working Memory: A total of twelve studies reported outcome measures associated with working memory and attention. The summary effect was moderate and significant (Hedges’ g observed = 0.614; 95% CI [0.285, 0.943]; Z = 3.658; p = 0.000). Examination of the HKSJ result adjusting for the small number of studies was also significant (HKSJ point estimate adjustment SMD = 0.627; 95% CI [0.176, 1.078]; t = 3.062; df = 11; p = 0.011). Heterogeneity across studies was significant (Q = 43.068; df = 11; p <  0.001; I 2 = 74.459%; τ2 = 0.227). Visual inspection of the funnel plot was somewhat asymmetrical, with two outliers (Balietti et al. 2016 & Herrera et al. 2012) There were no adjustments after calculation of Duval and Tweedie’s trim-and-fill method. Egger’s regression intercept was significant suggestive of small study effects (Intercept = 3.113; t = 2.533; p = 0.030; two-tailed). A figure of effect sizes and forest plot is presented in Fig. 3b and a funnel plot is provided in the supplemental materials as Fig. S3b.

    3. c.

      Speed of Information Processing: Six studies reported scores associated with speed of information processing. The summary effect size was moderate and non-significant (Hedges’ g observed = −0.434; 95% CI [−1.150, − 0.282]; Z = −1.187; p = 0.235). Calculation of the HKSJ adjustment was also not significant (HKSJ point estimate adjustment SMD = −0.441; 95% CI [−1.085, 0.203]; t = −1.759; df = 5; p = 0.139). Heterogeneity values were significant (Q = 58.656; df = 5; p = 0.000; I 2 = 91.476%; τ 2 = 0.701). Visual inspection of the funnel plot was asymmetrical and Duval and Tweedie’s trim-and-fill method trimmed two studies generating a moderate adjusted point estimate (Adjusted point estimate = −0.650; 95% CI [−1.232, −0.068]; Q = 80.117). Egger’s regression intercept was not significant (Intercept = 1.861; t = 0.480; p = 0.656; two-tailed). The mean of effect sizes and forest plot is presented in Fig. 3c and a funnel plot is provided in the supplemental materials as Table S3c.

    4. d.

      Language: There were seven studies which reported outcome data related to language functioning. The summary effect size was moderate and significant (Hedges’ g observed = 0.511; 95% CI [0.231, 0.790]; Z = 3.576; p <  0.001). Calculation of the HKSJ adjustment was also significant (HKSJ point estimate adjustment SMD = 0.519; 95% CI [0.179, 0.859]; t = 3.730; df = 6; p = 0.010). Heterogeneity indicators were significant (Q = 13.457; df = 6; p = 0.000; I 2 = 55.141%; τ2 = 0.072). Visual inspection of the funnel plot was asymmetrical with one outlier (Balietti et al. 2016). Duval and Tweedie’s trim-and-fill method trimmed three studies generating a medium adjusted point estimate (Adjusted point estimate = 0.282; 95% CI [−0.022, 0.587]; Q = 30.609). Egger’s regression intercept was not significant (Intercept = 2.362; t = 2.158; p = 0.083; two-tailed). The mean of effect sizes and forest plot is presented in Fig. 3d and a funnel plot is provided in the supplemental materials as Table S3d.

    5. e.

      Visual-Spatial: Two studies reported outcomes regarding visual-spatial ability. Given the number of studies fell below minimal cutoff of five, an analysis was not conducted for this domain.

    6. f.

      Memory: A total of twenty studies reported outcome scores for memory; either verbal memory, non-verbal memory, or both combined. The overall effect was moderate and significant (Hedges’ g observed = 0.659; 95% CI [0.383, 0.936]; Z = 4.6675; p = 0.000). HKSJ adjustment was also significant (HKSJ point estimate adjustment SMD = 0.675; 95% CI [0.305, 1.045]; t = 3.823; df = 19; p = 0.001), however, heterogeneity values were significant (Q = 90.898; df = 19; p < 0.001; I 2 = 79.098%; τ2 = 0.277). Visual inspection of the funnel plot was largely asymmetrical with two outliers (Balietti et al. 2016g = 2.884; Herrera et al., 2011 g = 3.158). There were no adjustments after calculation of Duval and Tweedie’s trim-and-fill method. However, Egger’s regression intercept was significant (Intercept = 2.565; t = 2.682; p = 0.015; two-tailed). The mean of effect sizes and forest plot is presented in Fig. 3e and a funnel plot is provided in the supplemental materials as Table S3e.

      1. i.

        Verbal Memory: There were fifteen studies which reported outcome data for verbal learning and memory (word list recall and story recall). The summary effects were large and significant (Hedges’ g observed = 0.758; 95% CI [0.382, 1.133]; Z = 3.956; p <  0.001). Calculation of the HKSJ adjustment was also significant (HKSJ point estimate adjustment SMD = 0.775; 95% CI [0.188, 1.362]; t = 2.833; df = 14; p = 0.013). Heterogeneity between studies was significant (Q = 95.811; df = 14; p = 0.000; I 2 = 85.388%; τ2 = 0.421). Visual inspection of the funnel plot was asymmetrical with two outliers (Balietti et al. 2016; Herrera et al. 2012). Egger’s regression intercept was significant suggestive of small-study effects (Intercept = 2.894; t = 2.256; p = 0.042; two-tailed). The overall effect size and forest plot for verbal memory is presented in Fig. 3f and a funnel plot is provided the supplemental materials as Fig. S3f.

      2. ii.

        Non-Verbal Memory: Five studies reported outcome data regarding non-verbal memory. The summary effect size was moderate and significant (Hedges’ g observed = 0.570; 95% CI [0.160, 0.980]; Z = 2.726; p = 0.006). However, the HKSJ adjustment failed to reach significance (HKSJ point estimate adjustment SMD = 0.593; 95% CI [−0.016, 1.202]; t = 2.705; df = 4; p = 0.054). Heterogeneity was not significant (Q = 5.082; df = 4; p = 0.279; I 2 = 21.292%; τ2 = 0.047). Visual inspection of the funnel plot was asymmetrical and Duval and Tweedie’s trim-and-fill method trimmed two studies generating a moderate-sized adjusted point estimate (Adjusted point estimate = 0.317; 95% CI [−0.158, 0.792]; Q = 12.965). Egger’s regression intercept was not significant (Intercept = 3.272; t = 1.701; p = 0.187; two-tailed). The effect sizes and forest plot for non-verbal memory is presented in Fig. 3g and a funnel plot is provided the supplemental materials as Fig. S3g.

    7. g.

      Executive Functions: A total of thirteen studies reported outcome data regarding the effects of cognitive training on executive functions. The summary effect size was moderate and significant (Hedges’ g observed = 0.575; 95% CI [0.093, 1.056]; Z = 2.339; p = 0.019). However, the HKSJ calculation failed to demonstrate significance (HKSJ point estimate adjustment SMD = 0.585; 95% CI [− 0.262, 1.432]; t = 1.505; df = 12; p = 0.158). Values representative of heterogeneity were significant (Q = 126.404; df = 12; p = 0.000; I 2 = 90.507%; τ 2 = 0.669). Visual inspection of the funnel plot was asymmetrical and there were no adjustments with Duval and Tweedie’s trim-and-fill method. Egger’s regression intercept was not significant (Intercept = 2.718; t = 1.242; p = 0.240; two-tailed). The effect sizes and forest plot for executive functioning measures is presented in Fig. 3h and a funnel plot is provided the supplemental materials as Fig. S3h.

  3. 3.

    Intervention Effects – By Training Type: Please see Table 4 for a breakdown of effects by training including neuropsychological instruments, individual effect sizes, and overall effects.

    Table 4 Effect sizes by intervention type: Effects, confidence intervals and prediction intervals
    1. a.

      Cognitive Stimulation: There were no studies which exclusively employed cognitive stimulation as a method of intervention in the works examined.

    2. b.

      Restorative Cognitive Training: Eight studies reported outcome data after using restorative training forms of intervention. The summary effect size was moderate and not significant (Hedges’ g observed = 0.541; 95% CI [−0.456, 1.539]; Z = 1.064; p = 0.288). Calculating the HKSJ adjustment to account for small number of studies was also non-significant (HKSJ point estimate adjustment SMD = 0.568; 95% CI [−0.555, 1.691]; t = 1.196; df = 7; p = 0.271). Indicators of heterogeneity were significant (Q = 111.092; df = 7; p = 0.000; I 2 = 93.699%; τ2 = 1.886). Visual inspection of the funnel plot was asymmetrical. There were no adjustments with Duval and Tweedie’s trim-and-fill method. Egger’s regression intercept was significant (Intercept = 8.356; t = 8.501; p = 0.000; two-tailed). As such, this would reflect the presence of heterogeneity, publication bias, and concerns due to small-study effects. The effect sizes and forest plot for restorative training is presented in Fig. 4a and a funnel plot is provided the supplemental materials as Fig. S4a.

      Fig. 4
      figure 4

      a Meta-analysis of restorative training on cognition (all outcomes). HKSJ point estimate adjustment SMD = 0.568; t = 1.196; p = 0.271. Test for heterogeneity Q = 111.092; df = 7; p <  0.001; I 2 = 93.699; τ 2 = 1.886. b Meta-analysis of multicomponent training on cognition (all outcomes). HKSJ point estimate adjustment SMD = 0.404; t = 2.810; p = 0.013. Test for heterogeneity Q = 55.511; df = 15; p < 0.001; I 2 = 72.978; τ 2 = 0.146

    3. c.

      Compensatory Cognitive Training: Two studies reported using compensatory cognitive training as an intervention in their study. An analysis was not conducted for this form of training given the number of studies fell below minimal cutoff of five.

    4. d.

      Multicomponent Training: Sixteen studies reported outcome data after using multicomponent training techniques. The summary effect size was moderate and significant (Hedges’ g observed = 0.398; 95% CI [0.164, 0.631]; Z = 3.337; p = 0.001). Calculating the HKSJ adjustment to account for the small number of studies was also significant (HKSJ point estimate adjustment SMD = 0.404; 95% CI [0.098, 0.710]; t = 2.810; df = 15; p = 0.013). Significant heterogeneity was observed (Q = 55.511; df = 15; p <  0.001; I 2 = 72.978%; τ2 = 0.146). Visual inspection of the funnel plot was asymmetrical with one prominent outlier (Bialetti et al. 2016). There were no adjustments with Duval and Tweedie’s trim-and-fill method. In addition, Egger’s regression intercept was not significant (Intercept = 1.819; t = 1.616; p = 0.128; two-tailed). While there was an indication of heterogeneity, significant point estimates in the context of relatively low values on bias indicators would favor there to be positive benefits from multicomponent training. The effect sizes and forest plot for multicomponent training is presented in Fig. 4b and a funnel plot is provided the supplemental materials as Fig. S4b. Sensitivity analysis examining effect sizes and measures of dispersion at various levels of correlation between test instruments demonstrated adequate mean effects and weights consistent with the values used a conservative estimate of combined outcome (Please see Table S8 in the supplementary materials).

  4. 4.

    Interventions Effects – By Targeted Domain: We also conducted meta-analyses by intervention method and its associated domain to examine the effects of interventions used on the cognitive outcome in which the intervention method was designed to target. Please see Table 5 for a breakdown of summary effects. Several domains were not examined due to the very small number or lack of studies which specifically targeted that domain (working memory n = 2, speed of information processing n = 2, language n = 0, visual-spatial ability n = 0, or executive functions n = 0). Additionally, while one study reported focusing primarily on memory, additional cognitive domains were included in their intervention methods and a composite score consisting of immediate and recognition paradigms was used to measure memory (Jeong et al. 2016). As such, this study was used only in the multidomain analysis.

    Table 5 Intervention Effects by Content (cognitive domain targeted): Effect sizes, confidence intervals and prediction intervals
    1. a.

      Memory: Seven studies used mnemonic-focused forms of intervention and reported outcome data for verbal or non-verbal recall measures. While there were ten studies using memory focused interventions in total, one study did not administer outcomes specifically assessing memory (Greenaway et al. 2012), another study did not have sufficient data from primary outcomes to enter for analysis (Jean et al. 2010a), and one study used a memory composite score which would introduce unrelated non-delay data into the analysis, namely, immediate recall and recognition (Jeong et al. 2016). These studies were excluded from the analysis leaving a total of seven studies to examine. The summary effect of memory training on memory outcomes was large and significant (Hedges’ g observed = 1.219; 95% CI [0.338, 2.100]; Z = 2.713; p = 0.007). HKSJ adjustment for the small number of studies was also significant (HKSJ point estimate adjustment SMD = 1.099; 95% CI [0.008, 2.190]; t = 2.465; df = 6; p = 0.049). However, heterogeneity across studies was significant (Q = 51.777; df = 6; p = 0.000; I 2 = 88.412%; τ2 = 1.219). Visual inspection of the funnel plot was asymmetrical with two outliers (Balietti et al. 2016 and Herrera et al. 2012), although there were no adjustments after calculation of Duval and Tweedie’s trim-and-fill method. Egger’s regression intercept was not indicative of small-study effects (Intercept = 3.397; t = 0.709; p = 0.510; two-tailed). A forest plot of the effects and overall summary of memory strategies on memory outcomes is presented in Fig. 5a. A funnel plot is provided the supplemental materials as Fig. S5a.

      Fig. 5
      figure 5

      a Meta-analysis of interventions by targeted domain: memory (verbal + non-verbal combined). HKSJ point estimate adjustment SMD = 1.099; t = 2.465; p = 0.049. Test for heterogeneity Q = 51.777; df = 6; p = 0.000; I 2 = 88.412; τ 2 = 1.219. b Meta-analysis of interventions by targeted domain: multidomain. HKSJ point estimate adjustment SMD = 0.232; t = 3.667; p = 0.003. Test for heterogeneity Q = 12.713; df = 12; p = 0.390; I 2 = 5.612; τ 2 = 0.003

    2. b.

      Multiple Domain and Lifestyle: A total of thirteen studies applied interventions which targeted multiple cognitive domains and other facets including lifestyle changes (i.e. education regarding meta-cognition, etc.). The summary effect was small and significant (Hedges’ g = 0.230; 95% CI [0.108, 0.352]; Z = 3.692; p = 0.000). Calculation of the HKSJ adjustment was also significant (HKSJ point estimate adjustment SMD = 0.232; 95% CI [0.094, 0.370]; t = 3.667; df = 12; p = 0.003). Heterogeneity indicators across studies was not significant (Q = 12.713; df = 12; p = 0.390; I 2 = 5.612%; τ2 = 0.003). With respect to possible publication bias, visual inspection of the funnel plot was slightly asymmetrical and Duval and Tweedie’s trim-and-fill method trimmed two studies generating a small adjusted point estimate (Adjusted point estimate = 0.205; 95% CI [0.056,0.350]; Q = 19.100). Egger’s regression intercept was not significant inferring a low likelihood of small-study effects (Intercept = 1.144; t = 1.795; p = 0.100; two-tailed). The significant point estimates and summary statistics would imply that interventions with content covering multiple cognitive domains may display improved performance on outcome measures (all included). As such, this finding would suggest individuals with MCI who receive interventions targeting multiple domains (including lifestyle changes) are apt to demonstrate a slight benefit on cognitive instruments. However, caution interpreting this finding is warranted due to the indication of some bias and inherently large number of intervention methods used and outcomes examined. While these may be statistically favorable findings, there may be other unknown factors unaccounted for by these analyses. Effect sizes, overall effect and forest plot for multidomain interventions on all outcomes (combined) is presented in Fig. 5b and a funnel plot is provided the supplemental materials as Fig. S5b.

  5. 5.

    Influence of Moderator Variables:

    1. a.

      Categorical Covariates: A subgroup analysis was conducted with moderator variables of MCI diagnosis, mechanism of intervention (mode), type of training, intervention content, type of control, period of follow-up assessment, and control for repeat administration. The details concerning the findings for each covariate examined is provided in Table 6.

      Table 6 Subgroup analysis: Examination of moderator variables and effect sizes on outcome measures across studies
      1. i.

        MCI Diagnosis: With respect to effect sizes for individual diagnostic categories, there was a significant effect for aMCI multiple domain (Hedges’ g = 0.647; p = 0.037) and a non-significant effect for aMCI single domain (Hedges’ g = 0.585; p = 0.116) and MCI-all (Hedges’ g = 0.329; p = 0.107). However, the overall test comparing diagnostic types would indicate there to be no difference in effects by MCI diagnostic category (Total Between Q = 0.887; df = 2; p = 0.642).

      2. ii.

        Mechanism of Intervention (Mode): The overall test exploring the mode of intervention was not significant. While the individual point estimate for an individual approach was significant (Hedges’ g = 1.008; p = 0.006), the overall test examining mode of intervention did not reach significance (Total Between Q = 2.918; df = 2; p = 0.232).

      3. iii.

        Training Type: The overall test comparing training type found no difference in the effectiveness of interventions based on type of training applied (Total Between Q = 0.211; df = 2; p = 0.900). Individual point estimates for multicomponent interventions were significant (Hedges’ g = 0.438; p = 0.019) while restorative and compensatory types of training were unremarkable (Hedges’ g = 0.389; p = 0.156 and Hedges’ g = 0.623; p = 0.150, respectively). There were no studies which exclusively used cognitive stimulation in the group of studies examined.

      4. iv.

        Intervention Content: The overall significance for domains targeted by the intervention was significant (Total Between Q = 7.942; df = 3; p = 0.047). Individual point estimates were also significant for memory (Hedges’ g = 0.671; p = 0.005) and multidomain (Hedges’ g = 0.449; p = 0.005) forms of content. This would suggest that interventions which focused on memory and multiple domains had a significant influence on outcomes (all cognitive measures combined). The larger effect size observed for memory would further suggest that interventions with memory content may be more effective than interventions with multidomain content. Of note, the lack of significance in other areas is likely due to very low number of studies (or no studies) and would not be regarded to be an indicator of ineffectiveness.

      5. v.

        Type of Control (Passive v. Active): The overall test comparing passive versus active forms of control groups was not significant (Total Between Q = 0.799; df = 1; p = 0.371). This would suggest the type of control group did not have an impact on the outcomes observed.

      6. vi.

        Period of Post-Intervention Assessment: The overall test comparing period of follow-up assessment post-intervention was not significant (Total Between Q = 0.838; df = 1; p = 0.360). This would suggest the period of post-assessment, within 2-weeks versus more than 2-weeks, did not appear to be a significant factor on the outcomes reported in the studies examined.

      7. vii.

        Control for Repeat Administration: Examining control for repeat test administration was not significant (Total Between Q = 1.160; df = 1; p = 0.281). This would imply that methods employed to counter test-retest effects such as alternate or parallel test versions were unlikely to have had an influence on measurement outcome for the instruments used.

    2. b.

      Covariates Combined: Twenty-five studies were included in the meta-regression analysis, one study could not be included as the duration of training (in hours) was not reported (Finn and MacDonald, 2015). A meta-regression analysis testing the model duration of training and type of training regressed on Hedges’ g was not significant. This would suggest duration of training and type of training was not related to effect size (k = 25; Q = 0.15; df = 3; p = 0.9848). Test for Goodness of Fit was also significant further indicating tht the model was also not predictive of an effect (τ2 = 0.4862; τ = 0.6973; I 2 = 87.38%; Q = 166.44; df = 21; p < 0.00). Test for between-study variance was not significant (τ 2 = 0.4903; τ = 0.7002; I 2 = 88.21%; Q = 203.59; df = 24; p = 0.0000; R 2 analog = 0.01). Examining increments at each step of the model, none of the covariates entered were significant. Please see Table 7 for a print out of the main results and Fig. 6 for a scatterplot of the regression of Hedges’ g on training duration.

      Table 7 Meta-regression of duration of intervention and type of cognitive training: main results and increments
      Fig. 6
      figure 6

      Scatterplot of regression of Hedges’ g on training duration

  6. 6.

    Post Hoc Analyses: An additional hierarchical regression was also conducted using three moderators, control for repeat administration, duration of training (total length of time), and type of cognitive training, to explore how the combination of these may be associated with the effect of intervention outcomes. The model was not significant (k = 25; Q = 1.37; df = 4; p = 0.8487. Test for Goodness of Fit was significant indicating the model was not fully predictive of the entire effect (τ2 = 0.4651; τ = 0.6820; I 2 = 86.55%; Q = 148.73; df = 20; p = 0.0000). Test for between-study variance was significant with the covariates explaining 5% of the variance (τ2 = 0.4903; τ = 0.7002; I 2 = 88.21%; Q = 203.59; df = 24; p = 0.0000; R 2 analog = 0.05). Examining increments and test of change at each step of the model was not significant for control for repeat administration (Step 1 Test of Model Q = 1.04; df = 1; p = 0.3068), duration of intervention (Step 2 = Q = 0.01; df = 1; p = 0.9123) or type of training (Step 3 = Q = 0.27; df = 2; p = 0.8750). Please see Table 8 for a print out of the main results and details for incremental steps in the analysis.

    Table 8 Post hoc analysis: meta-regression of control for repeat administration, duration of intervention and type of cognitive training: main results and increments

Discussion

Results from these meta-analyses provide new information regarding the efficacy of cognitive interventions and training in MCI and expand existing literature in several important ways. First, we found significant, moderate effects for multicomponent training (Hedges’ g observed = 0.398; 95% CI [0.164, 0.631]; Z = 3.337; p = 0.001; Q = 55.511; df = 15; p <  0.001; I 2 = 72.978%; τ 2 = 0.146) as well as multidomain focused strategies (Hedges’ g = 0.230; 95% CI [0.108, 0.352]; Z = 3.692; p < .001; Q = 12.713; df = 12; p = 0.390; I 2 = 5.612; τ 2 = 0.003). These results suggest that individuals with MCI who received multicomponent forms of training or used interventions which targeted multiple cognitive domains (including lifestyle changes) also demonstrated small – moderate improvements on measures of cognition post-intervention. This is consistent with other reviews which found support for multidomain forms of intervention (Ballesteros et al. 2015; Bamidis et al. 2014; Li et al. 2011, 2014, 2017; Maffei et al. 2017; Suo et al. 2016; Yin et al. 2014). Regarding the other interventions examined, there was insufficient evidence to offer greater clarification of effects by cognitive domain or type of training. This is also consistent with prior reports which found modest improvements after training and interpretive complications associated with methodological and technical factors (Belleville 2008; Gates et al. 2011; Huckans et al. 2013; Kruz et al., 2011; Stott and Spector 2011; Vidovich and Almeida 2011). Caution interpreting some findings is warranted and, in some instances, precludes interpretation due to heterogeneity arising from the number of strategies employed, measures used to quantify outcome, other differences in methodology, and diversity of settings.

Second, subgroup examination of the moderator effects associated with intervention content revealed a significant overall effect (Total Between Q = 7.492; df = 3; p = 0.047) with significant point estimates for memory-based (Hedges’ g = 0.671; p = 0.005) and multidomain forms of content (Hedges’ g = 0.449; p = 0.005). This would suggest that, in the MCI populations within the studies examined, interventions which were memory- or multidomain-based had sufficient influence on training to have made an appreciable difference on test performance post-intervention, with memory-based content possibly being more effective than multidomain methods. Of note, however, there was an insufficient number of studies in other domains to provide a meaningful investigation of other forms of content. In addition, results regarding the influence of moderator variables would be regarded as observational only and not the equivalent of conducting direct comparisons of each approach in well-designed RCTs. As such, cautions over interpreting these findings are warranted.

Third, apart from intervention content, the subgroup analyses and meta-regressions performed were not significant for any of the other categorical moderators examined. The lack of significance of these covariates would suggest that MCI diagnostic type, training type, mode of intervention, type of control, post-intervention assessment period, or control for repeat administration did not have an appreciable influence on subjects’ performance on outcome measures. This would imply that the structural elements of intervention programs and the mechanism of instruction (i.e. group vs. computer, etc.) may be less of a factor than the content of the interventions applied. The finding that duration of intervention (number of hours) had little influence on magnitude of outcomes in both meta-regression models is consistent with other reports which did not find a dose-response relationship between length of training and outcome benefit. As such, the duration of training may have little or no effect on intervention outcome and shorter sessions may be associated with larger treatment gains in older adults (Verhaeghen et al. 1992). In addition, there were no moderator effects observed for repeat test administration; the use of MCI controls under RCT conditions is likely to mitigate concerns for practice effects. This is further underscored by reports which found nominal consequences of test-retest administration in older adults (Mitrushina and Satz 1991).

Taken together, the moderate summary effects observed after multicomponent and multidomain interventions would be consistent with activation of compensatory mechanisms described by the STAC-r model such that multifaceted cognitive interventions may recruit alternate networks as neurocomputational support to aid primary functional networks process load demands (Barban et al. 2017; Ciarmiello et al. 2015; Onur et al. 2016). As such, strategies which target both primary functional networks as well as alternate pathways simultaneously may be the most efficacious form of intervention for individuals with MCI. These types of interventions may challenge primary and alternate networks together, prompting neuroplastic reorganization. However, training results in small to moderate effects due to decreased functionality and efficiency of primary networks  and limitations of alternate networks to fully compensate processing needs. The summary effect observed with multicomponent and multidomain forms of intervention may reflect this process. Individuals with MCI may, by definition, demonstrate less of a benefit from domain-targeted interventions due to loss of neural structure and function and better outcome with multidomain forms of intervention. While there are a limited number of RCT studies which specifically outline the neurocognitive pathways active after cognitive training with neuroimaging tools, this intepretation would be consistent with increased cerebral blood flow in parahippocampal areas and maintenance of neural efficiency after multicomponent training in MCI (Maffei et al. 2017).

Moreover, the positive benefit of strategies with memory-based content may represent some transfer effects to other domains. Although the creation of new primary network paths appears limited in MCI, interventions with memory-based content may facilitate partial activation of neuroplastic reorganization and compensatory processes in several areas (Hampstead et al. 2011; Hampstead et al. 2012a; Rosen et al. 2011). The positive benefit of memory-based strategies may also be indicative of the complex interrelated pathways involved in memory (Belleville et al. 2011). While inconclusive due to study limitations, the relative absence of significant effects of training by cognitive domain or domain-specific interventions in our data may reflect the loss of primary network efficiency and limited activation of compensatory mechanisms. However, this question could not be explored further due to the limited number of studies, the presence of heterogeneity, and lack of data interpretability regarding other interventions (i.e. restorative training). Similarly, mechanisms reflective of cognitive reserve could not be fully examined due to lack of data (i.e. premorbid IQ, level of education, etc.). The neurocognitive conclusions drawn from our analyses and offered here would be regarded as speculative only and require comprehensive examination of intervention methods with the appropriate neuroimaging tools in well-designed RCTs.

With respect to the specific questions we sought to answer,

  1. 1.

    What were the changes in cognition from baseline to outcome after the intervention was applied? Improvements in cognition were, generally, observed with multicomponent types of training. Benefits were also observed with multidomain and lifestyle approaches, although this finding is qualified due to possibility of bias, inherently large number of intervention methods applied, and large number of outcomes examined.

  2. 2.

    What were the common characteristics found to be effective across studies? There was a positive influence associated with interventions using memory and multidomain-lifestyle forms of approaches, as examined through moderator variable and subgroup analyses.

  3. 3.

    What are specific interventions that may be applied to individuals diagnosed with MCI in the clinical setting? Multicomponent types of cognitive training appear to improve performance on cognitive outcomes for individuals with MCI. Significant point estimates in the context of relatively low values on bias indicators infer there was a positive benefit from multicomponent training, although moderate heterogeneity due to the range of strategies, number of outcomes and various settings was present. The effects of cognitive stimulation, restorative training, and compensatory training were indeterminate due to heterogeneity, publication bias and the limited number of studies.

  4. 4.

    What are the key structural factors needed to set up an effective MCI intervention program (e.g. duration, group format, etc.)? With the exception of intervention content noted above, we found no significant influence for MCI diagnosis, training type, mode of intervention, type of control, post-intervention follow-up assessment period, or control for repeat administration on outcome measures. However, these findings should be interpreted with caution and regarded to be observational only as this would not be the equivalent of conducting specific RCT studies which directly compare each characteristic of training.

  5. 5.

    What inferences may be made regarding interventions applied and the neural processes involved in MCI? Multicomponent and multidomain forms of intervention may facilitate recruitment of alternate neural processes as well as supporting primary networks to meet task demands simultaneously, resulting in small to moderate training effects. Interventions with memory-based or multidomain forms of content may, specifically, facilitate some activation of compensatory scaffolding and neuroplastic reorganization, although the creation of new primary network paths may be limited due to the neuropathological processes associated with MCI.

Strengths & Limitations

Strengths of this review were to limit study inclusion to RCTs with individuals who met strict MCI diagnostic guidelines and to have measured cognitive performance with well-established standard neuropsychological instruments. Improvements in establishing clearer diagnostic criteria for the types of MCI and implementing this into RCTs has contributed significantly to understanding effects of cognitive training in MCI. In addition, the increased number of RCTs in MCI conducted over the past several years lends credence to the conclusions drawn. To our knowledge, this is the largest review of RCTs in MCI with MCI controls to date (k = 26), 92% of studies examined were published within the past seven years. The number of countries represented (n = 11) including a multicenter group of nations was also regarded as a strength, although the technical factors associated with this level of diversity also represents a challenge.

The limitations of this study are similar to other meta-analyses in MCI. There were generally small sample sizes, a range of interventions employed, diversity of sites, heterogeneity, publication bias and, while greatly improved in our view, a diverse number of instruments used to measure outcome. As such, there was insufficient data to conduct a complete evaluation of all interventions applied. Of greatest concern were the limited number of studies in individual cells, various interventions (some overlapping), types of training, heterogeneous number of outcomes (low precision), and modest number of active controls. A substantial amount of heterogeneity may have occurred secondary to the number of settings and variability in measurement precision. The studies examined were conducted across multiple countries with instruments which appeared to be similar, prima facie, but may have substantially differed in their psychometric properties. Evidence of this may be seen in effects of interventions on language, some heterogeneity was observed with an almost exclusive use of one measure of semantic fluency across studies.

Other limitations include the small number of studies in each diagnostic category, absence of measurement of premorbid IQ, unknown date of illness onset or character of course, minimal use of measures of adherence, and design variation between studies (Hampstead et al. 2014). The limited number of studies in each diagnostic category is problematic as MCI subtypes may demonstrate distinct connectivity patterns (Jacquemont et al. 2017). Improved characterization of various factors associated with baseline status and other covariates would also have aided comparability across studies. This has implications for our findings and a potential confound, the shifting deficits across illness and the progressive nature of MCI may require ‘evolving-strategies’ which complement and keep pace with patient cognitive decline to demonstrate a clinical benefit. Additionally, the lack of a unified method of reporting data and clear structure to communicate this information hinders accessibility to the results provided in published works.

In summary, findings from these meta-analyses suggest individuals with MCI who received multicomponent forms of training or used interventions which targeted multiple cognitive domains (including lifestyle changes) also demonstrated small to moderate improvements in cognition post-intervention. As such, multicomponent and multidomain forms of intervention may prompt recruitment of alternate neural processes as well as support primary networks to meet task demands simultaneously. In addition, interventions with memory and multidomain forms of content appear to be particularly helpful, with memory-based approaches possibly being more effective than multidomain methods. Otherwise, the effects of interventions by cognitive domain, training type, or other targeted domains (content) did not achieve statistical significance or were not significant. Given this pattern of results, although the creation of new primary network paths may be limited in MCI, interventions with memory or multidomain forms of content may facilitate partial activation of compensatory scaffolding and neuroplastic reorganization. The positive benefit of memory-based strategies may also reflect transfer effects indicative of compensatory network activation and the multiple-pathways involved in memory processes. Cautions interpreting these findings is warranted, however, due to heterogeneity, probability of publication bias, and small number of studies across cognitive domains. Significant amounts of variability and bias were present precluding more a definitive interpretation of the outcomes observed.