Selection bias in genome-wide association studies (GWASs) due to volunteer-based sampling (volunteer bias) is poorly understood. The UK Biobank (UKB), one of the largest and most widely used cohorts, is highly selected. We develop inverse probability weighted GWAS (WGWAS) to estimate GWAS summary statistics in the UKB that are corrected for volunteer bias. Correcting for volunteer-based selection through WGWAS decreases the effective sample size by 62% on average (across ten phenotypes) compared to GWAS. WGWAS yields novel genome-wide significant associations, larger effect sizes and heritability estimates, and altered gene-set tissue expressions, compared to GWAS. The extent of volunteer bias's impact on GWAS results varies by phenotype. Traits related to disease, health behaviors, and socioeconomic status were most affected. These findings suggest that volunteer bias in extant GWASs is substantial and calls for what we refer to as GWAS2.0, a substantial revisiting of the current state of GWAS analyses based on carefully constructed population-representative data sets, either through the development of IP weights or a greater focus on population representative sampling.
Sjoerd van Alten, Vrije Universiteit Amsterdam
Benjamin W. Domingue, Stanford University
Jessica Faul, University of Michigan
Titus Galama, Vrije Universiteit Amsterdam, University of Southern California
Andries T. Marees, Vrije Universiteit Amsterdam