Background Previous research exploring the link between population mixing and childhood leukaemia has generated equivocal and contradictory results. This research has adopted two principal analytical approaches: (i) selecting areas according to specific characteristics and comparing the incidence of childhood leukaemia in these areas with that expected on the basis of the average national incidence; and (ii) using regression analyses of data from an entire region to model characteristics associated with the incidence of childhood leukaemia. We compare these two approaches to identify any potential biases resulting from each.
Methods Using data for 532 electoral wards from a provincial region of the UK, a ‘simulated region’ was generated based on the correlation structure and distributions of relevant variables therein (population, area size and extent of inward-migration). Cases of childhood leukaemia were simulated using a Poisson distribution with the incidence rate set to the national average, whilst assuming the null hypothesis that only population size influences incidence. 3% of the areas within the ‘simulated region’ were selected on the basis of: characteristics likely to generate elevated infection rates; or elevated rates of leukaemia that might prompt the investigation of characteristics associated therewith. Each analytical approach was then emulated using: (i) the binomial test on areas aggregated according to each of the two selection criteria; and (ii) regression analysis of data for the whole of the ‘simulated region’. Appropriate p-values for each approach were calculated for 10,000 simulations.
Results Under the null hypothesis, the p-values generated from the 10,000 simulations approximate a uniform distribution using both approaches when: (i) characteristics hypothesised to promote an elevated incidence of leukaemia (low population density and/or high levels of inward-migration) drive selection; and (ii) a regression model of the entire dataset adjusts for these characteristics. However, when an elevated incidence prompts the investigation of selected areas, approach (i) yields a highly skewed distribution of p-values, indicating that a significant result would be found on >5% of occasions.
Conclusion When using characteristics thought to be responsible for an elevated incidence of childhood leukaemia (i.e. low population density and/or high levels of inward-migration) neither (i) selection on these characteristics, nor (ii) statistical adjustment for them, are subject to bias. However, if selection with approach (i) is prompted by high incidence of childhood leukaemia, a statistically significant result is found more often than expected. Approach (ii) is therefore the preferred approach for investigating possible links between population mixing and childhood leukaemia.