Article Text

other Versions

Download PDFPDF
Transnational research partnerships: leveraging big data to enhance US health
  1. Casey Crump1,
  2. Kristina Sundquist2,3,
  3. Marilyn A Winkleby3
  1. 1Department of Medicine, Stanford University, Stanford, California, USA
  2. 2Center for Primary Health Care Research, Lund University, Malmö, Sweden
  3. 3Stanford Prevention Research Center, Stanford University, Stanford, California, USA
  1. Correspondence to Dr Casey Crump, Department of Medicine, Stanford University, 211 Quarry Road, Suite 405, MC 5985, Palo Alto, CA 94304-1426, USA; kccrump{at}

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

In the current era of big data and small research budgets, new strategies are needed for more cost-effective leveraging of big data to enhance our nation's health. One strategy is to promote transnational partnerships to tap into the rich, extensive databases available in other countries, particularly in Europe. The National Institutes of Health (NIH) has increasingly recognised that new collaborations that bring together multiple data sources will play a critical role in advancing our knowledge of disease causation, improving patient care, and promoting healthier communities. However, given cuts in research funding and fierce competition for US grants, some question whether US dollars should be diverted to fund ‘foreign’ studies. In this commentary, we argue that transnational research partnerships offer significant advantages for enhancing the health of the US population as well as the broader global community.

In the USA, the collection of population-wide health data has been hampered by the inherent difficulties in linking patients across many different healthcare delivery systems. As a result, the availability of big data for health research has been limited mainly to a few large organisations such as Kaiser Permanente, Group Health, the Mayo Clinic and VA hospitals. The data collected by such organisations are rich resources but have significant limitations. They include only a selected patient population, which is often a poor representative of the broader population in terms of socioeconomic, ethnic or health factors, thus limiting generalisability. Their patient populations also fluctuate over time due to changes in insurance plan enrolments, making long-term outcomes more difficult to track. In addition, their patient care data are often not linkable with broader information such as census, neighbourhood and multigeneration data that would allow examination of more complex pathways affecting health. In contrast, these limitations do not exist in certain countries that have universal healthcare with electronic medical records linked to national health registries. In the Nordic countries (Denmark, Finland, Iceland, Norway and Sweden), for example, the entire national population is essentially a cohort. By embedding data collection within the national healthcare infrastructure, extensive clinical and epidemiological data are prospectively obtained that are unparalleled in completeness, quality and size by those available in the USA, enabling robust testing of hypotheses that are also relevant to the US population.

Transnational research partnerships

NIH-funded partnerships with countries that already have comprehensive population-based data sets are cost-efficient investments, allowing US investigators and their partners to answer important research questions that are not possible using US data. Leveraging of registry data from the Nordic countries (with a combined population of ∼25 million), for example, provides numerous advantages compared with US sources alone. They include nearly 100% complete, high-quality nationwide data on inpatient and outpatient clinical diagnoses, prescription records, birth and death records, sociodemographic characteristics, and (in some countries) highly detailed neighbourhood-level social and physical environment characteristics, and a national biobank for genetic studies. All data sources are mutually linkable using a confidential, anonymous version of a unique personal identification number assigned to each person at birth or immigration. Many of these data sources have already been prospectively collected for decades, enabling large-scale studies of long-term temporal trends, life course analyses and extensive family-based designs to disentangle genetic and environmental influences on disease. Universal health coverage also facilitates more complete and equitable ascertainment of health conditions across different social groups, allowing more rigorous studies in high-risk subpopulations. Large national cohort studies based on these data can provide more robust findings that avoid selection and ascertainment biases commonly affecting other observational study designs. Generalisability of biological findings to the USA and other Western countries is high because of similar underlying mechanisms across these populations, and is enhanced for sociodemographic findings by high immigration rates that have increased social and ethnic diversity over the past few decades (eg, ∼26% of the Swedish population are 1st-generation or 2nd-generation immigrants). Because these rich data sources have already been constructed, large-scale studies can be conducted at remarkably low direct costs (typically well below US$1 per participant), as well as much lower indirect costs than most US-based studies.

International data to enhance US health

NIH-funded studies based on these data are making vital contributions to health research and policy in the USA—including new knowledge about disease mechanisms, clinical translation and prevention—that would otherwise be logistically and financially infeasible. For example, Nordic prescription databases have enabled numerous landmark studies of the health effects, safety and cost-effectiveness of medications that are commonly used in the USA.1 Such studies make vital contributions beyond those of clinical trials because of their larger sample sizes, more diverse patients and longer exposures, enabling robust assessments of how medications work in the real world. In other studies, the unique ability to follow a large national cohort from birth into adulthood has enabled novel investigations of early life influences on chronic disease and mortality in later life. This led to the first-ever identification of increased mortality risks in adulthood associated with preterm birth,2 as well as many other long-term chronic disease sequelae. These discoveries have advanced our knowledge of early life origins of chronic disease, and will help inform long-term clinical care for the growing number of adult survivors of preterm birth, which currently affects nearly 12% of US births and costs more than US$26 billion annually in US healthcare expenditures and lost productivity.3 Similarly, the first study of long-term mortality associated with early-term birth (37–38 weeks of gestation) has supported a re-definition of full-term birth,4 and can potentially influence the timing of deliveries to enhance maternal and infant health outcomes for the nearly 30% of US births that occur at early term. These and many other seminal studies of health issues with large population impacts are expanding the knowledge base needed to develop better-targeted preventive interventions and health policy in the USA.

Federal mandates calling for greater collaboration

The NIH has called for increasing collaboration across different agencies to expedite the translation of research findings into knowledge that improves human health. Several recommendations can be made for developing more cost-effective partnerships to facilitate the use of big data for translational research. First, transnational partnerships to leverage big data that are unavailable in the USA are vital for enabling larger population-based studies that are highly robust, generalisable and cost-efficient. Additional projects that incorporate such data are needed to explore new hypotheses that will benefit important health problems in the USA, including better prevention and treatment of chronic diseases, mental disorders, and maternal and child health issues. These collaborative projects are a ‘win-win’ for US investigators and their partners, and most importantly for the health of their respective populations. Second, using a similar strategy in US settings, further integration of clinical trials and observational studies into existing healthcare delivery systems is needed to generate larger study samples at lower costs per participant, by creatively reusing existing data and infrastructure. Third, continued investments in translational research are needed to ensure that the scientific discoveries from these efforts are translated into new interventions that reach the target US population groups.5 Such efforts in the USA and abroad can make major contributions towards enhancing our nation's health through cost-effective use of more comprehensive health data. Future NIH partnerships that incorporate these strategies are worthy investments and need broader replication in other healthcare settings and populations.



  • Competing interests None.

  • Provenance and peer review Not commissioned; internally peer reviewed.