Article Text
Abstract
Background The absence of a centralised and comprehensive register-based system limits opportunities for studying the interaction of aspects such as health, employment, benefit payments, and housing at the micro-level in Great Britain (GB). In some cases, surveys can provide a swiftly available alternative. However, survey data do typically not allow for a detailed spatial resolution. While area-level linkages of surveys can enable a more granular spatial resolution, sampling strategies are often not representative for sub-national levels and results of aggregations might not be meaningful due to small sample sizes. Survey-based full-scale synthetic population datasets can help to bypass these highlighted limitations of surveys. By providing attribute-rich data for individuals and households, synthetic population datasets can enable both: Representativeness and statistical power at a granular spatial resolution.
Methods We present the Synthetic Population for Individuals in Great Britain 2019 – 2021 (SPIGB), a survey-based full-scale synthetic population dataset developed by the System Science in Public Health and Health Economics Research (SIPHER) consortium, and provide details on its creation, validation, limitations, and applications. The SPIGB dataset was created via a combinatorial optimisation algorithm (simulated annealing) and combines individual-level data from the Understanding Society survey (wave 11, ‘k’) with aggregate-level population statistics obtained from the UK Census and population projections for Lower layer Super Output Areas and Data Zones.
Results The SPIGB dataset is representative with respect to 8 characteristics; age/sex, highest qualification, ethnicity, marital status, economic activity, general health, household tenure, and household type at a small-area level. Results of external and internal validation suggest that the dataset makes for a well-suited resource across different applications examining health and socioeconomic outcomes across small areas. Ongoing and completed projects have utilised the SPIGB dataset to obtain insights into spatial patterning of alcohol consumption across Greater Manchester, to construct an interactive R-shiny dashboard for policy stakeholders, as an input in microsimulation models exploring the population health impact of the Scottish Child Payment, and to explore the dataset's potential for the creation of synthetic linked administrative data in Scotland's safe havens.
Conclusion The SPIGB is a well-suited dataset for exploring health and socioeconomic domains at a granular spatial resolution across a range of different applications. At the same time, care is required when seeking to disentangle causal multilevel structures or individual-level characteristics for which the association with the utilised constraint variables has not been evaluated or is unknown.