12th Annual IGSS Conference • October 28-29, 2021

Integrating Genetics and the Social Sciences 2021

The effects of demographic-based selection bias on GWAS results in the UK Biobank

Sjoerd van Alten, Vrije Universiteit Amsterdam

Genome-wide association studies (GWASs) are almost always based on a non-random sample of the underlying population, as obtaining very large sample sizes, rather than ensuring such samples are representative, has been key to their success. Selection bias in estimated genetic associations, including how it varies across traits, is poorly understood. A sample of particular interest is the widely used UK Biobank (UKB). Because of the need for very large samples, the UKB is included in almost all large GWASs as one of the largest cohorts. In addition, UKB's subsample of genotyped siblings (UKBSIB) has become a crucial resource for estimating genetic effects free of environmental confounding. Using nationally representative UK Census microdata as a reference, we document substantial non-random selection into the UKB, and even stronger for UKBSIB: individuals in the UKB and UKBSIB are more likely to be female, higher educated, and older, compared to the underlying population that received an invitation. We also show that this non-random selection leads to significant selection bias in associations between various demographic and health-related traits estimated in the UKB. We then estimate probabilities of UKB participation for each UKB participant to estimate selection-corrected GWASs for multiple traits using inverse probability weighting. Based on preliminary analyses for the top 5,000 SNPs associated with BMI, education, and height, respectively, we show that the extent to which selection-corrected GWAS results differ from those of regular GWASs is trait-specific. Genetic associations for educational attainment are the most altered after correcting for volunteer bias, whereas associations for height remain virtually the same, and associations with BMI are only marginally affected. We will extend these analyses by investigating more phenotypes, conducting regular and inverse probability weighted GWASs in the UKB that incorporate all available SNPs, and comparing results. Our findings will be useful for understanding the extend to which a particular phenotype is prone to selection bias in GWAS and our correction method provides an alternative when population-representative cohorts are not available.

Presenter's website

Poster