In a previous paper, we introduced the Polygenic Index Repository, a resource intended to promote productive research using polygenic indexes (PGIs, often called "polygenic scores" or "polygenic risk scores"), DNA-based predictors of a phenotype. Here, we provide an update to the Repository and make two additional contributions. We substantially improve the Polygenic Index Repository: we expand the set of phenotypes for which we create PGIs from the original 47 to 68; for many of the phenotypes, we use newer GWAS with larger sample sizes, leading to more predictive PGIs; we use a newer methodology for calculating PGI weights that generates more predictive PGIs (even holding fixed the GWAS sample sizes); and we expand the number of datasets participating in the repository from 11 to 18. In datasets with sibling or parent-offspring pairs, we impute the genotypes of the missing parent(s) and make parental PGIs. As in the original repository, we construct the PGIs ourselves and make them available as variables downloadable from the data providers. To facilitate interpretation of a causal PGI study, we derive the standard additive model of genetic effects and formulate PGI analyses in the potential outcomes causal framework. We show that, in a PGI study that controls for parental PGIs, the coefficient on an individual's PGI can be interpreted as a weighted sum of causal effects of SNPs. These weights, when obtained from a standard GWAS that does not control for parental genotypes, will not be optimal even if the GWAS were conducted in a large sample. We characterize the attenuation in the PGI's predictive power relative to a PGI obtained using weights from a "causal GWAS" that controls for parental genotypes. Even if the weights come from a "causal GWAS", if the GWAS is conducted in a finite sample, the weights in the weighted sum additionally suffer from errors-in-variables bias. We propose, and make a Python tool available for, an estimator that corrects for this bias. This estimator can be used in place of ordinary least squares (OLS) regression of a phenotype on a PGI, parental PGIs, covariates, and interactions between the PGI or parental PGIs and covariates. It generalizes previous estimators that applied only in special cases.

Robel Alemu, University of California Los Angeles

Daniel J. Benjamin, University of California Los Angeles

Aysu Okbay, Vrije Universiteit

Anastasia Terskaya, University of Barcelona,

Patrick Turley, University of Southern California

Alexander I. Young, University of California Los Angeles