Article Text

Download PDFPDF
Sample selection and validity of exposure–disease association estimates in cohort studies


Background Participants in cohort studies are frequently selected from restricted source populations. It has been recognised that such restriction may affect the study validity.

Objectives To assess the bias that may arise when analyses involve data from cohorts based on restricted source populations, an area little studied in quantitative terms.

Methods Monte Carlo simulations were used, based on a setting where the exposure and one risk factor for the outcome, which are not associated in the general population, influence selection into the cohort. All the parameters involved in the simulations (ie, prevalence and effects of exposure and risk factor on both the selection and outcome process, selection prevalence, baseline outcome incidence rate, and sample size) were allowed to vary to reflect real life settings.

Results The simulations show that when the exposure and risk factor are strongly associated with selection (ORs of 4 or 0.25) and the unmeasured risk factor is associated with a disease HR of 4, the bias in the estimated log HR for the exposure–disease association is ±0.15. When these associations decrease to values more commonly seen in epidemiological studies (eg, ORs and HRs of 2 or 0.5), the bias in the log HR drops to just ±0.02.

Conclusions Using a restricted source population for a cohort study will, under a range of sensible scenarios, produce only relatively weak bias in estimates of the exposure–disease associations.

  • Directed Acyclical Graphs
  • selection bias
  • confounding
  • Monte Carlo Simulations
  • epidemiology ME

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.