Article Text

Download PDFPDF
Sample selection and validity of exposure–disease association estimates in cohort studies
  1. Costanza Pizzi1,2,
  2. Bianca De Stavola2,
  3. Franco Merletti1,
  4. Rino Bellocco3,4,
  5. Isabel dos Santos Silva5,
  6. Neil Pearce2,6,
  7. Lorenzo Richiardi1
  1. 1Cancer Epidemiology Unit, CeRMS and CPO-Piemonte, University of Turin, Italy
  2. 2Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, UK
  3. 3Department of Statistics, University of Milano Bicocca, Milan, Italy
  4. 4Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
  5. 5Department of Non-communicable Disease Epidemiology, London School of Hygiene & Tropical Medicine, London, UK
  6. 6Centre for Public Health Research, Massey University Wellington Campus, New Zealand
  1. Correspondence to Costanza Pizzi, Via Santena 7, 10126 Torino, Italy; costanza.pizzi{at}


Background Participants in cohort studies are frequently selected from restricted source populations. It has been recognised that such restriction may affect the study validity.

Objectives To assess the bias that may arise when analyses involve data from cohorts based on restricted source populations, an area little studied in quantitative terms.

Methods Monte Carlo simulations were used, based on a setting where the exposure and one risk factor for the outcome, which are not associated in the general population, influence selection into the cohort. All the parameters involved in the simulations (ie, prevalence and effects of exposure and risk factor on both the selection and outcome process, selection prevalence, baseline outcome incidence rate, and sample size) were allowed to vary to reflect real life settings.

Results The simulations show that when the exposure and risk factor are strongly associated with selection (ORs of 4 or 0.25) and the unmeasured risk factor is associated with a disease HR of 4, the bias in the estimated log HR for the exposure–disease association is ±0.15. When these associations decrease to values more commonly seen in epidemiological studies (eg, ORs and HRs of 2 or 0.5), the bias in the log HR drops to just ±0.02.

Conclusions Using a restricted source population for a cohort study will, under a range of sensible scenarios, produce only relatively weak bias in estimates of the exposure–disease associations.

  • Directed Acyclical Graphs
  • selection bias
  • confounding
  • Monte Carlo Simulations
  • epidemiology ME

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


  • Funding The study was conducted within projects partially funded by Compagnia SanPaolo/FIRMS, the Piedmont Region, the Italian Ministry of University and Research (MIUR), the Italian Association for Research on Cancer (AIRC) and the Massey University Research Fund (MURF). The Centre for Public Health Research is supported by a Programme Grant from the Health Research Council of New Zealand.

  • Competing interests None.

  • Provenance and peer review Not commissioned; externally peer reviewed.