Article Text
Abstract
Background Intervention differential effects (IDE) occur when change in a health outcome following an intervention depends upon the baseline value of that outcome. Oldham’s method and multilevel modelling are methods used to detect IDEs. However, the conditions under which these methods are robust are not well documented. One condition that has not been explored is detection of IDEs in studies which recruit according to baseline health status (i.e. above a threshold). We hypothesised that recruiting/selecting above a threshold affects the reliability of existing methods to detect IDEs because of regression to the mean. We hypothesised that comparing these ‘truncated’ samples with a control group restores the robustness of these methods. Using weight-loss interventions as an example, we show how to overcome the challenges of regression to the mean in studies with threshold selection criteria.
Methods We simulated two datasets comprising repeated measures of body mass index (BMI) data for 1000 males aged 25–34 (‘population’ datasets). One dataset was simulated to have an IDE, and the other (‘null’) dataset was simulated without. Half the population in each dataset were simulated to receive a weight-loss intervention. To emulate real-word weight-loss interventions, we truncated each population dataset to select intervention and control group samples with BMI scores above ≥30 kg/m2. Oldham’s method and multilevel modelling were used on the ‘population’ intervention groups and corresponding ‘truncated’ samples for each simulation. We repeated each analysis to contrast the intervention and control group datasets (using Fisher’s z-transformation and student’s t-test for Oldham’s method, and the likelihood ratio test for multilevel modelling). Simulations were repeated 10 000 times to generate Type I error rates and 95% credible intervals. Simulations were performed in R and MLwiN.
Results Under the null of no IDE, Oldham’s method and the multilevel model yielded Type I error rates >90%, confirming that selecting above a threshold leads to bias due to regression to the mean. Type I error rates returned to 5% for the multilevel model when a control group was introduced and the likelihood ratio test employed, while Type I error rates improved but remained elevated when Fisher’s z-transformation and student’s t-test were used to contrast groups.
Conclusion Our study shows that multilevel models can robustly detect IDEs in ‘truncated’ samples (selected above a threshold) if analyses involve a control group. For study designs that do not collect control group data (such as most evaluations of weight management programmes), the identification of IDEs currently remains intractable.