Article Text
Abstract
Contemporary research in biosocial science can demand vast sample sizes. Often, data must be aggregated across several studies or data sources to provide adequate power. When a pooled analysis is required, analytic efficiency and flexibility are typically best served by combining the individual-level data from all sources and analysing them as a single large data set. But valid ethico-legal constraints can prohibit or discourage the sharing of individual-level data, particularly across jurisdictional boundaries. This leads to a fundamental conflict between competing public goods. DataSHIELD (Data Aggregation Through Anonymous Summary-statistics from Harmonised Individual-levEL Databases) provides a simple approach to analysing pooled data that circumvents this conflict. Modern distributed computing is used and advantage taken of the properties of the algorithm that iteratively updates parameter estimates in generalised linear modelling. The presentation will cover the need for DataSHIELD, its theoretic basis, opportunities and challenges, and how to find out more.