After the RCT: who comes to a family-based intervention for childhood overweight or obesity when it is implemented at scale in the community?

Background When implemented at scale, the impact on health and health inequalities of public health interventions depends on who receives them in addition to intervention effectiveness. Methods The MEND 7–13 (Mind, Exercise, Nutrition…Do it!) programme is a family-based weight management intervention for childhood overweight and obesity implemented at scale in the community. We compare the characteristics of children referred to the MEND programme (N=18 289 referred to 1940 programmes) with those of the population eligible for the intervention, and assess what predicts completion of the intervention. Results Compared to the MEND-eligible population, proportionally more children who started MEND were: obese rather than overweight excluding obese; girls; Asian; from families with a lone parent; living in less favourable socioeconomic circumstances; and living in urban rather than rural or suburban areas. Having started the programme, children were relatively less likely to complete it if they: reported ‘abnormal’ compared to ‘normal’ levels of psychological distress; were boys; were from lone parent families; lived in less favourable socioeconomic circumstances; and had participated in a relatively large MEND programme group; or where managers had run more programmes. Conclusions The provision and/or uptake of MEND did not appear to compromise and, if anything, promoted participation of those from disadvantaged circumstances and ethnic minority groups. However, this tendency was diminished because programme completion was less likely for those living in less favourable socioeconomic circumstances. Further research should explore how completion rates of this intervention could be improved for particular groups.

The supplement refers to analyses of MEND starters and completers only (analyses of those referred to MEND were based on completely observed data).

S2. Background and theory
Missing data is an issue in most research settings and arise when researchers (or services in this case) intend to collect information but do not. Three broad processes, described by Rubin, 1 can lead to missing data: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). All methods using data with missing values make an assumption that the missing data were introduced into their datasets by one or more of these processes. These assumptions are made because there are few applied situations where the reasons for data to be missing can be determined.
Complete case analysis, a common analytical approach, excludes individuals who have missing data on any variable (incomplete data) from the analysis. This approach assumes that complete cases are a random subsample; that missing values were introduced into the analysis completely at random (MCAR). This approach can be adequate in situations where differences between those individuals with complete and incomplete data are minimal and where incomplete data are not extensive. In the event that there are differences between individuals with complete or incomplete observations, population parameters estimated in the analysis (such as means, proportions or regression coefficients) may be biased in ways which are difficult to predict. Further, if missing data are extensive then the precision of estimates may be reduced, often leading to standard errors being inflated, with the implication that a difference may not be declared when in fact it exists (i.e. a type II error).
A number of approaches exist to analyse data where those with complete and incomplete values differ systematically. All standard approaches assume that differences between the missing values and observed values can be related to information collected elsewhere in the dataset (the MAR assumption). Multiple imputation (MI) is a general approach to producing valid inferences when analysing partially observed epidemiological data. 2 The theoretical basis for the approach was developed by Rubin 1,3 and it has been recommended for use in clinical and epidemiological analyses. 2 MI typically assumes that data are MAR although where theoretically justified it can also assume MNAR. 2 The two potential advantages of MI in our analysis were that MI theoretically produces unbiased population parameters, and is statistically more efficient because all individuals contribute data to the analysis and therefore the statistical power to estimate parameters precisely is retained.
It is important to note that it is impossible to verify why data are missing (i.e. how far the MCAR, MAR or MNAR assumptions are correct) in any given analysis although the data can be used to support whether the MCAR assumption is plausible or not. Therefore, Sterne et al. 2 have developed guidelines for the use and reporting of MI in clinical and epidemiological analysis (Table 1). This supplement is structured with reference to these guidelines to provide: a comprehensive account of how missingness was addressed using MI in our study; why we chose this approach; and analyses of data which support this approach. 3. Indicate how many individuals were excluded because of missing data when reporting the flow of participants through the study. 4. Clarify whether there are important differences between individuals with complete and incomplete data 5. Describe the type of analysis used to account for missing data (e.g., MI), and the assumptions that were made (e.g. missing at random) 6. Report details of the software used and of key settings for the imputation modelling 7. Report the number of imputed datasets that were created (five imputed datasets have been suggested to be sufficient on theoretical grounds, but a larger number may be preferable to reduce sampling variability from the imputation process) 8. What variables were included in the imputation procedure? 9. How were non-normally distributed and binary/categorical variables dealt with?
10. If statistical interactions were included in the final analyses, were they also included in imputation models?
11. If a large fraction of the data is imputed, compare observed and imputed values 12. Where possible, provide results from analyses restricted to complete cases, for comparison with results based on MI. If there are important differences between the results, suggest explanations, bearing in mind that analyses of complete cases may suffer more chance variation, and that under the missing at random assumption MI should correct biases that may arise in complete cases analyses.
13. Discuss whether the variables included in the imputation model make the missing at random assumption plausible.
14. It is also desirable to investigate the robustness of key inferences to possible departures from the missing at random assumption, by assuming a range of missing not at random mechanisms in sensitivity analyses. This is an area of ongoing research S3. Results relating to Sterne et al guidelines G1.

Report the number of missing values for each variable of interest.
Our analyses directly used six variables (the 'variables of interest') from the MEND 7-13 service dataset which had varying proportions of missingness. While other variables (outcome variables) also had missing data, these were included in the model for reasons of good practice (described in more depth in section G8. Table 2 shows what proportion of each variable of interest was missing. It is of note that a relatively large proportion of data was missing for the employment status variable. To decide whether this amount of data could be theoretically imputed we referred to peer-reviewed work, finding work published in the British Medical Journal where the authors have imputed outcomes with 70-75% of observations missing. 4

G2. If possible, give reasons for missing values, in terms of other variables
It is not possible to know exactly why data were missing, i.e. whether missingness was introduced by MCAR, MAR or MNAR. We hypothesised that data were missing for two broad reasons: missingness introduced by differential reporting by socio-economic and ethnic groups, and missingness introduced by differential amounts of errors at the data collection and data entry stages.
Analyses of the Millennium Cohort Study show that the ethnic background of mothers, the socioeconomic status of the ward they live in, family structure, housing tenure and household income are all associated with variations in response. 5,6 These results reflect a wider empirical literature which also reports socio-economic and ethnic variations in response. 7 Hypotheses of why these associations arise are typically broad in nature, reflecting the heterogeneity of ethnic minority and socio-economic groups. For example, Allison et al. 7 suggest that literacy and English comprehension amongst "some sections" of ethnic minority communities might be low and therefore present barriers to responding to surveys designed by white groups from favourable socio-economic circumstances.
All variables with missing data (listed in G8) were collected and entered at local MEND programmes. Therefore, if data were missing because of errors in data collection or data entry, these might be expected to vary with the staff and procedures in place for each programme. We derived variables at the programme level which might be expected to be associated with these variations in missingness between programmes including: the numbers of programmes delivered by each programme manager to date; programme group size; and variables measuring the proportion of height and weight measures which were digit rounded.
Employment status was not collected in 2007 and 2008 and so was not present for those years.
Other variables which were observed in these years (ethnic group, family structure and housing tenure) did not vary in their proportions between 2007/08 and 2009/10. On this basis, we assumed that employment status questions for those years could be imputed from values collected in 2009 and 2010.
G3. Indicate how many individuals were excluded because of missing data when reporting the flow of participants through the study.
Overall, MEND collected 21,503 records of families who were referred to the programme and contacted MEND by telephone. 3,214 of these were duplicates (n=371), had incomplete data on age, sex and postcode (n=2,471), or were out of the age range of 6-13 (n=372). Of the 18,289 remaining 'referrals', 13,998 had BMI measured at the first measurement session, and were designated 'starters' on this basis. These 13,998 had incomplete data on the variables of interest and taking a complete case approach would have left a complete case sample of 2,787, excluding 11,211 individuals. These flows are shown in the paper in Figure 1.

G4. Clarify whether there were important differences between individuals with complete and incomplete data
We observed large differences between the distribution of SDQ, housing tenure, and smaller differences in the distribution of adiposity, ethnicity, employment status, attendance, and area deprivation when comparing the proportions of the complete and incomplete data. There were no differences in the distributions of respondents with complete and incomplete data by sex, family structure, or urban/rural status (Table 4).
Therefore, while we expect that analyses based on complete data might underestimate the proportion of abnormally distressed children, and those from private and social renting households.

G5.
Describe the type of analysis used to account for missing data (e.g., MI), and the assumptions that were made (e.g. missing at random) We decided to use MI, making the assumption that data were missing at random (MAR), to account for missing data in the MEND study. Our rationale was based on our hypotheses developed in G2, that missing data were missing due to factors which were observed in the dataset and could be modelled using a multiple imputation model.

G6. Report details of the software used and of key settings for the imputation modelling
We used REALCOM-IMPUTE Software for Multilevel MI with Mixed Response Types. 8 REALCOM-IMPUTE is general use software developed for the applied researcher. As such, the algorithms have been developed and tested for use across a variety of settings. This contrasts with software developed specifically within one context and which may not be applicable in other contexts, a potential software limitation discussed by Sterne et al. 2

G7. Report the number of imputed datasets that were created
We generated 10 datasets -double the number broadly recommended by Sterne et al. 2 Imputing more datasets reduces sampling variability from the imputation process. We ran 3,000 iterations model, 500 iterations for burn in and a further 2,500, yielding 10 datasets, drawn after 250 iterations. We tested the model for sensitivity to the number of iterations, running earlier models for differing lengths (for example 1000 burn in, drawing datasets at 2,500 iterations). Estimates of analyses did not vary substantively using different numbers of iterations.

G8.
What variables were included in the imputation procedure?
REALCOM-IMPUTE was used to impute the parameters listed below. These included the measured variables of which the responses are those with missing data and the 'predictors' are those which were completely observed. The parameters also included a random intercept term which varied for each programme and estimated the proportion of missing data at the programme level.
3. Urban /rural status (level 1)* 4. Density of unhealthy and healthy food outlets (level 1)* 5. Built environment (density of roads and green space, level 1)* 6. Number of programmes delivered by programme manager (level 2) 7. Size of MEND group at start of programme (level 2) 8. Proportion of height measures rounded to 0 or 0.5cm (level 2) 9. Proportion of weight measures in programme rounded to 0 or 0.5kg (level 2) Variables which were starred in the lists above were included in the MI model but were not theorised to be associated with missingness on the variables of interest here (i.e. ethnicity, family structure, housing tenure, employment status, attendance, or baseline SDQ). These variables were included because they were used in other analyses of the data reported elsewhere as part of our wider study of the MEND data and good imputation practice suggests that all variables to be included in any later models of interest are included in imputation models to ensure that relationships between variables are not under-estimated. 2

G9. How were non-normally distributed and binary/categorical variables dealt with?
REALCOM-IMPUTE was developed specifically to robustly handle mixed response types and the statistical theory and equations are described by Goldstein et al. 9 while the software itself and the way that it implements these is described by Carpenter et al. 8 We followed these procedures.

G10. If statistical interactions were included in the final analyses, were they also included in imputation models?
No statistical interactions were included in the final analyses.

G11. If a large fraction of the data is imputed, compare observed and imputed values
There is no consensus to our knowledge about what constitutes 'too high' a fraction of missing data. 10 As described above, the amount missing on any given variable ranged from 7 to 63%. While 7% is possibly not a 'large fraction' we compare the observed values against the imputed values for all the variables for completeness in Table 4.
Proportions on all variables are identical for baseline SDQ. Proportions but not confidence intervals were identical or very similar for family structure, housing tenure and employment status although confidence intervals differed slightly. Proportions were similar for ethnicity with white and Asian families estimated to be slightly higher in imputed data, and black and other families estimated to be slightly lower in imputed data. Imputed and observed values differed most for attendance: completion and non-completion was estimated to be lower in imputed data while partial completion was estimated to be higher.
Overall, given the large proportion of missing on variables such as employment status, there are few large differences between the observed and imputed values. Only completion differs slightly, and the difference is as might be expected. Analyses for guideline 4 showed that lower socio-economic groups were under-represented in those with complete case data, and less favourable socioeconomic circumstances have been associated with higher attrition in paediatric weight management interventions. 11 Thus, it might be expected that completion would be over-estimated using only observed values.

G12. Provide results from analyses restricted to complete cases, for comparison with results based on MI.
In the sensitivity analyses below we present data for the imputed datasets (as in the paper) and for complete case (CC) samples where data is complete for all the variables of interest. Given the high missing data on parental employment, we also explore the influence of this variable on the results, by generating a 'complete case' dataset which was complete for all variables except employment status (equivalent to excluding employment status from the analysis). Table 5 and Table 6 show differences in proportions calculated for imputed and both complete case datasets for starters (Table 5) and completers (Table 6) respectively. The results show that the conclusion we draw in the paper -that "the provision and / or uptake of MEND did not appear to compromise, and if anything, promoted participation among those from more disadvantaged circumstances and from ethnic minority groups" -was consistent across imputed and complete case datasets. Table 7 shows the relative risks of completion for imputed data and the two types of complete case analyses. The relative risks do not vary in direction or magnitude. This shows that the conclusion in the paper, that -"completion was relatively less likely for those participants living in less favourable socio-economic circumstances" -was consistent across imputed and complete case datasets.

G13. Discuss whether the variables included in the imputation model make the missing at random assumption plausible.
Referring back to the discussion of reasons for missingness in guideline 2, we hypothesised that missingness would be associated with between-group differences on ethnic and socio-economic variables. We also discussed how missingness on all variables would be expected to vary systematically between programmes and that variables measured at the programme level relating to data quality, staff experience and group size might be expected to be associated with missingness.
Therefore, the plausibility of the MAR assumption would be supported by evidence showing that missingness on the variables of interest varied systematically between programmes and that missingness was associated with proxies for the variables mentioned. In the following analysis we aim to test whether these associations support the imputation model described above.
We assessed whether the missing at random possibilities by constructing six binary variables marking where each of the variables of interest above was missing (coded 1) or observed (coded 0). We used multilevel poisson regression models to model each missingness outcome with no covariates (i.e. six variance components model). We then used Equation 1 to calculate the proportion of variation in missing data on each variable that was attributable to systematic differences between programmes.

Equation 1: 'Exact calculation method' of estimating variance partition coefficient (VPC) in multilevel poisson regression models 12
Where VPC is the variance partition coefficient, ܺߚ is the fixed part of the model (in the case of these variance components models this is the intercept term and is a constant) and ߪ ଶ which is the level 2 variance term.
96% of missingness on attendance was explained by systematic variation between programmes. This is plausible because families have no role in the data collection process; missing data arise on this variable because it is entered erroneously or not at all by programme staff. However, high proportions of the missingness on other variables were also explained by the programme level, ranging from 25% to 69% -again consistent with the reasons outlined in guideline 2 above, that surveys were collected and entered by programme staff. Unfortunately, the model for baseline SDQ did not converge and so the proportion of missing data attributable to programme level variation could not be estimated. The second reason for missingness varying was because ethnic minority groups and families from different socio-economic circumstances might vary in their reporting on survey questions. We used poisson regression models (with adjustment for the clustering at programme level so that associations were independent of the clustering of missing data demonstrated above) to estimate whether missingness was associated with other variables in the dataset. Table 9 shows that missingness on the ethnic and socio-economic variables was associated with observed values on the other socio-demographic variables, neighbourhood deprivation, programme group size and the number of programmes delivered by the programme manager. Missingness on attendance was also associated with group size, the percentage of height measures in the programme which were rounded and with social and private renting.
The data do not support the assumption that missingness was MCAR (i.e. that missingness arose by chance and was not associated with any observed variables).
Overall, the presence of statistically significant associations between missingness on analysis variables supported the rationale developed in guideline 2, that missingness could be explained by observed variables at the family, neighbourhood and programme level. G14. Investigate the robustness of key inferences to possible departures from the missing at random assumption, by assuming a range of missing not at random mechanisms in sensitivity analyses.
We could not formulate hypotheses where missing values for our variables might themselves be associated with missingness (i.e. where missingness might be MNAR). In the absence of these hypotheses under which to introduce MNAR assumptions into the MI model for sensitivity analyses, we could not investigate this possibility.

S4. Summary
Missing data were extensive in the MEND service data, which means that MI was more efficient than complete case analysis and would therefore be likely to estimate more precise parameter estimates. Further, we showed that there were statistically significant but small differences between individuals with complete and incomplete data for all variables of interest in the analysis: namely psychological distress, ethnic group, socio-economic circumstances, and attendance. Proportions of missing data were also systematically missing between MEND programmes. This suggested that missing data were not MCAR and that multilevel MI was more likely to produce unbiased estimates of population parameters than complete case analysis.
Our comparison of complete case analyses and imputed analyses showed that there were no substantive differences in findings between the MI and complete case results beyond expected losses of precision related to the large reduction in power in complete case analyses.
Overall, we were confident that our multiple imputation findings were robust and that they were likely to lead us to draw the most valid statistical inferences from the MEND data.