Article Text

Download PDFPDF

Comparison of the sociodemographic characteristics of the large NutriNet-Santé e-cohort with French Census data: the issue of volunteer bias revisited
  1. Valentina A Andreeva1,
  2. Benoît Salanave2,
  3. Katia Castetbon2,
  4. Valérie Deschamps2,
  5. Michel Vernay2,
  6. Emmanuelle Kesse-Guyot1,
  7. Serge Hercberg1,2,3
  1. 1Université Paris 13, Equipe de Recherche en Epidémiologie Nutritionnelle (EREN), Centre de Recherche en Epidémiologie et Statistiques, Inserm (U1153), Inra (U1125), Cnam, COMUE Sorbonne Paris Cité, Bobigny, France
  2. 2Unité de Surveillance en Epidémiologie Nutritionnelle (USEN), Institut de Veille Sanitaire, Université Paris 13, Centre d'Epidémiologie et Biostatistiques, Sorbonne Paris Cité, Bobigny, France
  3. 3Département de Santé Publique, Hôpital Avicenne, Bobigny, France
  1. Correspondence to Dr Valentina A Andreeva, Université Paris 13, Equipe de Recherche en Epidémiologie Nutritionnelle (EREN), Centre de Recherche en Epidémiologie et Statistiques, Inserm (U1153), Inra (U1125), Cnam, COMUE Sorbonne Paris Cité SMBH 74 rue Marcel Cachin, Bobigny 93017, France; v.andreeva{at}uren.smbh.univ-paris13.fr

Abstract

Background A recurring concern in traditional and in Web-based studies pertains to non-representativeness due to volunteer bias. We investigated this issue in an ongoing, large population-based e-cohort.

Methods The sample included 122 912 individuals enrolled in the Internet-based, nutrition-focused NutriNet-Santé study between May 2009 and March 2014, with complete baseline data. Participants were recruited via recurrent multimedia campaigns and other traditional and online strategies. Individuals aged 18+ years, residing in France and having Internet access, were eligible for enrolment. Their sociodemographic characteristics were compared with the corresponding 2009 Census data via χ2 goodness-of-fit tests. The effectiveness of statistical weighting of the e-cohort data was also explored.

Results The sample exhibited marked geographical and sociodemographic diversity, including volunteers belonging to typically under-represented subgroups in traditional surveys (unemployed, immigrants, the elderly). Nonetheless, the proportions of women, relatively well-educated individuals and those who are married or cohabiting, were notably larger compared with the corresponding national figures (women: 78.0% vs 52.4%; postsecondary education: 61.5% vs 24.9%; married or cohabiting: 70.8% vs 62.0%, respectively; all p<0.0001).

Conclusions There were notable sociodemographic differences between the general French population and this general population-based e-cohort, some of which were corrected by statistical weighting. The findings bear on the potential generalisability of future investigations in the context of e-epidemiology.

  • Cohort studies
  • Epidemiological methods
  • RESEARCH METHODS
  • Research Design in Epidemiology

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Introduction

Amid the weighty challenges of traditional epidemiological research (low response rates, complex logistics, high costs, burdensome follow-up and data management), the Internet has emerged as a viable, convenient and possibly superior alternative for recruitment and collection of good quality data.1–7 Its capacity to handle high activity volumes, cost-effectiveness, interactive interface with built-in controls, the relatively quick fine-tuning of instruments, the streamlined data management, access and follow-up, and reduced social anxiety/desirability have given the Internet the high ground in research endeavours.3 ,4 ,8 ,9 Moreover, the Internet-using population nowadays includes the large majority of individuals and households,10 and continues to expand across social strata.3 ,9 For example, 62% of French households had Internet connections in 2008, while the corresponding percentage in 2012 was 80%, representing a 29% increase over only 4 years.10 The popularity of mobile technologies has further increased Web access and use away from home or work, and across gender and age. Next, close to half (42%) of European Internet users aged 55–74 years reported regular Internet use.10 Moreover, the Internet has an undeniable edge in terms of certain hard-to-reach population subgroups (illicit drug users, disadvantaged individuals).11 ,12

Despite these advantages, concerns about non-representativeness, non-probability sampling (ie, sampling strategies without known probabilities of enrolling individuals of the target population), coverage, and volunteer bias in traditional and Web-based research, remain.2–4 ,8 In fact, the reliance on volunteers entails potential methodological drawbacks and challenges irrespective of the research platform used. Whereas most recruitment methods result in some bias13 which could affect the external validity of the results, non-response bias can be present irrespective of the method and might not be increased (and possibly even mitigated) in Web-based studies.3 In turn, survey response representativeness has been considered potentially more informative than the response rate itself.14

Despite the wide variety of past and current Web-based epidemiological studies, large general-population e-cohorts remain scarce. In addition, study design, recruitment methodology and research objectives rarely overlap, hence e-cohort characteristics are not easily comparable and existing knowledge in this area might be limited. The large majority of the e-cohorts include only women (college-age, reproductive-age or pregnant).4 ,15–17 We undertook the present study in order to examine the sociodemographic characteristics in the NutriNet-Santé e-cohort, which relies on general population enrolment without any personal solicitation. We compared these characteristics with the respective French national Census data. Based on existing evidence from the literature,17 we hypothesised divergence in terms of educational level and convergence in terms of geographical area of residence as regards these NutriNet-Santé–Census comparisons. Consistent with epidemiological evidence,1 we also hypothesised an over-representation of women in the cohort.

Methods

Study population and data collection

The NutriNet-Santé e-cohort was launched in France in May 2009 and has a planned 10-year recruitment and follow-up. Participants are recruited via a combination of traditional and online strategies, including vast, recurrent multimedia campaigns (television, radio, Internet, national/regional newspapers, billboards, flyers).11 The inclusion criteria pertain to residence in France, age ≥18 years and Internet access. Registration and participation take place online via a dedicated and secure Web site (http://www.etude-nutrinet-sante.fr).18 Two years before launching the study, a multidisciplinary team was assembled in order to conceptualise, design, develop and implement the online tools. Issues related to infrastructure, levels of access, confidentiality, data encryption, transfer, privacy and security were carefully addressed. The NutriNet-Santé study was approved by the ethics committee of the French Institute for Health and Medical Research (IRB INSERM n° 0000388 FWA00005831) and by the National Commission on Informatics and Liberty (CNIL n° 908450 and n° 909216). No additional research/ethics authorisations were required for the present study. To the best of our knowledge, this cohort represents the first large (>130 000 enrolees to-date), exclusively Web-based, general-population, prospective cohort worldwide aimed at elucidating the relationship between multiple aspects of nutrition (eating habits, dietary patterns, nutrient intake, nutritional status, physical activity) and health (disease incidence, health/risk behaviours, biomarker status, mortality). Overall, the NutriNet-Santé study is positioned to serve as a resource for investigating directional, aetiological and mechanistic hypotheses pertaining to the complex relationship between nutrition and health.18

On provision of informed consent and an electronic signature, each person receives a confirmation email with a unique login number and a password. Each registered user has 3 weeks to complete in full the baseline questionnaire set (sociodemographics and lifestyle, health status, physical activity, anthropometrics and diet) before being considered enrolled in the study.18 The data collection instruments were largely adapted from those employed in the French SU.VI.MAX («Supplémentation en Vitamines et Minéraux Antioxydants») and ENNS («Étude Nationale Nutrition Santé») studies.19 ,20 During the survey process, participants receive general instructions and automatic prompts, including text, images and error messages, designed to ensure response accuracy and completeness.

Census data

We compared the sociodemographic characteristics in the sample with the corresponding 2009 Census data for adults aged ≥18 years in metropolitan France. These data are collected and managed by the French National Institute for Statistics and Economic Studies (INSEE), which is responsible for the production and analysis of official statistics in France. The Census data were gender-specific and included information on age (grouped into five categories), birthplace (France, Europe, Africa, Americas and Asia/Oceania), marital status (single/living alone, widowed/divorced/separated and married/cohabiting), educational level (up to secondary, some college and university degree), occupational status (unemployed/ disabled/homemaker, student, farmer/manual labour, artisan/self-employed, office worker/skilled labour, executive/professional staff and retired), presence of children aged <18 years in the household (yes/no) and geographical area of residence (Paris metropolitan area, Paris basin, North, East, West, Southwest, East-Central and Mediterranean).

Statistical analysis

For the present analysis, we used data from individuals who enrolled in the NutriNet-Santé study between May 2009 and March 2014, and had complete baseline data. χ2 Goodness of fit tests were performed in order to compare the distribution of the observed and expected (ie, national) frequencies in two-way contingency tables. We also assessed the effectiveness of statistical weighting of the e-cohort data, applying weights calculated via the SAS CALMAR (CALage sur MARges) macro developed by INSEE.21 It included the following information: gender (for the full sample), age, birthplace, educational level, employment status, marital status, presence of children in the household and geographical area of residence. All analyses were conducted with SAS (V.9.3, SAS Institute, Inc) and the significance level was set at 0.01 owing to the large sample size.

Results

Sociodemographic profiles

From the full sample of N=137 326 enrolees in the NutriNet-Santé e-cohort, for the present analysis we selected n=122 912 who had complete baseline data (78% women, mean age=42.6±14.6 years, range 18–95 years). Men were slightly older than women (mean age=47.3±15.2 years vs 41.3±14.2 years, p<0.0001), were more likely to be married or cohabiting (75.5% vs 69.5%, p<0.0001), to have university-level education (36.8% vs 29.9%, p<0.0001) and to be retired (29.1% vs 13.7%, p<0.0001).

Approximately 5% of the participants were born abroad, about two-thirds had postsecondary education and were married or cohabiting, without children <18 years in the household. Among participants with children in the household (34.2% in the full sample), less than half (45.4%) reported having one child, whereas 2.4% reported having four or more children in the household. The sociodemographic characteristics of the cohort as well as the corresponding 2009 Census data for adults aged ≥18 years living in metropolitan France are presented in table 1 (full sample), table 2 (men) and table 3 (women).

Table 1

Baseline sociodemographic characteristics of participants in the NutriNet-Santé e-cohort (2009–2014, n=122 912) in comparison with 2009 national estimates for individuals aged ≥18 years in metropolitan France

Table 2

Baseline characteristics of male participants in the NutriNet-Santé e-cohort (2009–2014, n=27 016) in comparison with 2009 national estimates for individuals aged ≥18 years in metropolitan France

Table 3

Baseline characteristics of female participants in the NutriNet-Santé e-cohort (2009–2014, n=95 896) in comparison with 2009 national estimates for individuals aged ≥18 years in metropolitan France

With respect to the raw (unweighted) data, the e-cohort displayed proportions very similar to those reported nationwide as regards the presence of children <18 years in the household and the geographical area of residence. The age and income distribution in the sample also displayed some resemblance to the corresponding distribution in the general French population. In turn, the largest discrepancies were found as regards gender (women: 78.0% in NutriNet-Santé vs 52.4% in France) and educational level (postsecondary education: 61.5% in NutriNet-Santé vs 24.9% in France). Likewise, there were marked discrepancies in terms of employment status. Among men and women, the sample included substantially fewer manual labourers and substantially more executive/professional staff than represented in the population level. Importantly, the proportion of men and women who were unemployed, disabled, or homemakers (ie, categories often under-represented in epidemiological research) was higher in the NutriNet-Santé sample than the corresponding national figures.

Given the large sample size—overall and by gender—all χ2 goodness-of-fit tests were statistically significant (p<0.0001). After the statistical weighting, the percentage distributions of gender, age, birthplace, educational level, employment status, marital status, presence of children in the household and geographical area of residence became identical to those found in the French population.

Discussion

We examined the sociodemographic profiles of volunteers in the large and ongoing NutriNet-Santé e-cohort by comparing the characteristic distribution frequencies against national Census data for adults aged ≥18 years living in metropolitan France. An original aspect of this e-cohort is its focus on recruiting members of the general population. As hypothesised, we observed a notable convergence in terms of geographical area of residence and a notable divergence in terms of educational level as regards the NutriNet-Santé–Census comparisons. Other e-cohorts with different study designs, recruitment methodologies and research objectives, have also revealed the predominance of well-educated volunteers in their samples.15–17 Also, as hypothesised, the majority of the enrolled volunteers were women. These findings suggest the non-representative nature of the overall sample and the presence of volunteer bias. Reviews of the large epidemiological literature show that women, those of higher socioeconomic status and those who are married, are more likely to participate in research than are men, those who are single, or those of lower socioeconomic status, respectively.1

Generally, individuals are more likely to take part in a research survey if the topic is salient to their own life, while extreme efforts at enrolling nonrespondents might in fact introduce further bias.1 As regards strategies to increase response to Web-based questionnaires, literature review findings suggest that for community-based surveys, lotteries with a small number of large prizes might be the most cost-effective incentive.22 Regarding the recruitment of young (college-age) single men, future efforts could also include leaflet distributions at campus-based sporting events as well as social network advertising.23

Establishing the relative representativeness of a cohort is important because it allows prevalence estimations in the reference population, investigation of risk factor trends and an unbiased evaluation of exposure–outcome relationships.24 In future analyses, especially those focused on the prevalence of a given disease or health/risk behaviour,25 the issue of non-representativeness might pose a challenge. However, representativeness is not regarded as critical for aetiological studies, as long as there is sufficient measurement and control for potential confounding factors permitting the investigation of causal mechanisms.26 Further, concerns about the cohort representativeness or associated selection and volunteer biases might be mitigated in e-cohorts by the fact that the Internet allowed the inclusion of a sizeable and heterogeneous sample of participants with a wide range of exposures. For example, our sample included >8000 individuals aged 65+ years, >6200 individuals born outside France and >15 400 individuals who were unemployed, disabled or homemakers. Individuals belonging to such sociodemographic categories are considered hard-to-reach and are typically under-represented and understudied in traditional epidemiological research,1 unless specific, carefully-targeted recruitment methods are used. In addition, these data attest to the widespread use of the Internet across social strata.

To the best of our knowledge, only one other Web-based epidemiological study has compared its sample characteristics against national data.17 Specifically, the Australian Longitudinal Study on Women’s Health recruited (primarily via targeted advertising on social network websites) female permanent residents of Australia aged 18–23 years and compared their sociodemographic profiles with data from the 2011 Australian Census. These authors found the participants to be representative in terms of geographical distribution, however, a higher percentage had attained university and trade qualifications compared with the Census data.17 Overall, despite the number of existing Web-based epidemiological studies, their features are challenging to compare given important differences in terms of study design, recruitment methodology and research objectives. Whereas other e-cohorts have used multimedia recruitment strategies, such efforts have been coupled with targeted website advertising.15 ,17 Also, unlike NutriNet-Santé, the majority of the other large e-cohorts have recruited only women.15–17

The advantages of electronic surveys have been touted since the 1980s, including their potential to serve as a cost-effective research tool capable of providing data in an analysable format, while combining the strengths of interviews (prompts, branching) as well as paper-and-pencil surveys (standardisation).27 Interest in electronic health surveys has risen with the advent of the Internet coupled with steadily declining participation rates since the early 1990s in traditional epidemiological research.1 While truly representative samples are uncommon in prospective cohort studies given the voluntary basis of participation, Internet cohorts might be more representative than traditional cohorts as regards age, socioeconomic status and geographical location. Unlike many traditional research methods, online surveys also offer disabled individuals an opportunity to enrol.28 In the NutriNet-Santé study, however, the principal concern was the reliance on volunteers (exclusive of any random selection or personal invitations) who were sufficiently motivated to enrol and remain in the cohort. The statistical weights efficiently corrected the bias entailed in the frequency distributions of gender, age, birthplace, educational level, employment status, marital status, presence of children in the household and geographical area of residence.

Open cohorts, such as the NutriNet-Santé, are common in eHealth research.2 Important features of such non-probability-based research include widespread advertising of the study's website without any control over the number of individuals reached or the types of individuals recruited,2 as well as improved efficiency with longer follow-up and larger samples.5 Given that target population coverage and access barriers are rapidly diminishing,4 authors have indicated that any challenges associated with e-cohorts are likely transitional,29 and will be solved in the near future.8 Original aspects regarding the NutriNet-Santé design and methodology include its very large sample size drawn from the general population, a broad focus on nutrition, an assessment of a wide range of personal characteristics and recurrence of the multimedia recruitment campaigns, with each new effort drawing attention to specific public health recommendations regarding diet and nutrition.11 The overarching aim of the NutriNet-Santé study is elucidation of the complex link between nutrition and health.18 The study was conceived as a very large, population-based e-cohort that is well positioned to help untangle the multifaceted aetiological role of nutrition via accurately measured dietary intake and a plethora of potential confounders.30

An important limitation of the NutriNet-Santé study, stemming from the recruitment strategy, is the lack of information on participation and refusal rates. Next, authors have evoked the issue of multiple submissions in online surveys.31 Given the complex study protocol and the substantial responder burden in the NutriNet-Santé study, multiple submissions, while possible, are unlikely. In fact, one of the major challenges regarding enrolment in the study is the breadth and length of the dietary intake assessment. As with traditional survey methodologies, the NutriNet-Sante study largely relies on self-reports of volunteers.

In conclusion, the present study, featuring several original aspects (very large sample size; general population; unsolicited enrolment with few restrictions; a broad focus on nutrition; assessment of a wide range of personal characteristics; comparisons with national Census data), showed that the sample was broadly representative geographically, while women and well-educated individuals were over-represented compared with national data. This e-cohort demonstrates the potential of Internet-based epidemiological research to acquire very large and heterogeneous samples, including hard-to-reach subgroups, of the general population. The findings bear on the potential generalisability of future investigations in the context of e-epidemiology.

What is already known on this subject

  • Web-based epidemiological research is gaining in popularity as a viable, convenient and cost-effective alternative to traditional recruitment and data collection methods. However, concerns about non-representativeness, non-probability sampling, coverage and volunteer bias in Web-based research, recur.

What this study adds

  • The present study sheds light on issues related to volunteer bias and the lack of representativeness in a very large, general population-based e-cohort in France. Despite marked heterogeneity in terms of age, employment status and geographical area of residence, comparisons with French Census data revealed that the large majority of the enrolled volunteers were well-educated, married and predominantly women. In future investigations, especially those focused on the prevalence of certain diseases or behaviours, such a volunteer bias might lead to under-estimations and to concerns about the range of various exposures.

Acknowledgments

The authors wish to express their gratitude to the Institute for Public Health Research (IRESP) as well as to the following individuals: Dr Fabien Szabo de Edelenyi, Charlie Menard, Mohand Ait-Oufella, Yasmina Chelghoum, Laurent Bourhis, Nathalie Arnault, Thi Hong Van Duong, Paul Flanzy and Stephen Besseau, for their assistance with data management.

References

Footnotes

  • Contributors VAA performed the literature review, data analysis, led the writing and has primary responsibility for the final content; KC, EK-G and SH designed the NutriNet-Santé study, directed its implementation, including quality assurance and control, and coordinated recruitment and data collection; KC and BS were responsible for obtaining Census data; VAA, BS, KC, VD, MV, EK-G and SH designed the study's analytic strategy; BS, EK-G and SH provided methodological and theoretical guidance; all authors assisted with interpretation of data, and read and edited each draft of the manuscript for important intellectual content. All authors read and approved the final manuscript.

  • Funding This work was supported by the French Ministry of Health (DGS), the French Institute for Health Surveillance (InVS), the National Institute for Prevention and Health Education (INPES), the Foundation for Medical Research (FRM), the National Institute for Health and Medical Research (INSERM), the National Institute for Agricultural Research (INRA), the National Conservatory of Arts and Crafts (CNAM) and the University of Paris XIII.

  • Competing interests None.

  • Ethics approval French Institute for Health and Medical Research, National Commission on Informatics and Liberty.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement Requests for access to the data can be made to SH.