Multi-resolution localization of causal variants across the genome.
Algorithms
Chromosome Mapping
/ methods
Datasets as Topic
Feasibility Studies
Genome, Human
/ genetics
Genome-Wide Association Study
/ methods
Genomics
/ methods
Humans
Linkage Disequilibrium
Models, Genetic
Multifactorial Inheritance
/ genetics
Polymorphism, Single Nucleotide
/ genetics
Quantitative Trait Loci
/ genetics
Software
Journal
Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555
Informations de publication
Date de publication:
27 02 2020
27 02 2020
Historique:
received:
17
06
2019
accepted:
01
02
2020
entrez:
29
2
2020
pubmed:
29
2
2020
medline:
6
5
2020
Statut:
epublish
Résumé
In the statistical analysis of genome-wide association data, it is challenging to precisely localize the variants that affect complex traits, due to linkage disequilibrium, and to maximize power while limiting spurious findings. Here we report on KnockoffZoom: a flexible method that localizes causal variants at multiple resolutions by testing the conditional associations of genetic segments of decreasing width, while provably controlling the false discovery rate. Our method utilizes artificial genotypes as negative controls and is equally valid for quantitative and binary phenotypes, without requiring any assumptions about their genetic architectures. Instead, we rely on well-established genetic models of linkage disequilibrium. We demonstrate that our method can detect more associations than mixed effects models and achieve fine-mapping precision, at comparable computational cost. Lastly, we apply KnockoffZoom to data from 350k subjects in the UK Biobank and report many new findings.
Identifiants
pubmed: 32107378
doi: 10.1038/s41467-020-14791-2
pii: 10.1038/s41467-020-14791-2
pmc: PMC7046731
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.
Langues
eng
Sous-ensembles de citation
IM
Pagination
1093Commentaires et corrections
Type : ErratumIn
Références
Visscher, P. M. et al. 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 101, 5–22 (2017).
pubmed: 28686856
pmcid: 5501872
Tam, V. et al. Benefits and limitations of genome-wide association studies. Nat. Rev. Genet. 20, 467–484 (2019).
pubmed: 31068683
Yu, J. et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. 38, 203–208 (2006).
pubmed: 16380716
Kang, H. M. et al. Efficient control of population structure in model organism association mapping. Genetics 178, 1709–1723 (2008).
pubmed: 18385116
pmcid: 2278096
Kang, H. M. et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010).
pubmed: 20208533
pmcid: 3092069
Zhang, Z. et al. Mixed linear model approach adapted for genome-wide association studies. Nat. Genet. 42, 355–360 (2010).
pubmed: 20208535
pmcid: 2931336
Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284 (2015).
pubmed: 4342297
pmcid: 4342297
Loh, P.-R., Kichaev, G., Gazal, S., Schoech, A. P. & Price, A. L. Mixed-model association for biobankscale datasets. Nat. Genet. 50, 906–908 (2018).
pubmed: 6309610
pmcid: 6309610
Reich, D. E. et al. Linkage disequilibrium in the human genome. Nature 411, 199–204 (2001).
pubmed: 11346797
Pritchard, J. K. & Przeworski, M. Linkage disequilibrium in humans: models and data. Am. J. Hum. Genet. 69, 1–14 (2001).
pubmed: 11410837
pmcid: 1226024
Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).
pubmed: 28622505
pmcid: 5536862
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
pubmed: 17701901
pmcid: 17701901
Schaid, D. J., Chen, W. & Larson, N. B. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat. Rev. Genet. 19, 491–504 (2018).
pubmed: 29844615
pmcid: 6050137
Hormozdiari, F., Kostem, E., Kang, E. Y., Pasaniuc, B. & Eskin, E. Identifying causal variants at loci with multiple signals of association. Genetics 198, 497–508 (2014).
pubmed: 25104515
pmcid: 4196608
Kichaev, G. et al. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet. 10, 1–16 (2014).
Benner, C. et al. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493–1501 (2016).
pubmed: 26773131
pmcid: 4866522
Wang, G., Sarkar, A.K., P., Carbonetto & M., Stephens A simple new approach to variable selection in regression, with application to genetic fine-mapping. Preprint at https://doi.org/10.1101/501114 (2018).
Candès, E. J., Fan, Y., Janson, L. & Lv, J. Panning for gold: model-x knockoffs for high-dimensional controlled variable selection. J. R. Stat. Soc. B 80, 551–577 (2018).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995).
Li, N. & Stephens, M. Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165, 2213–2233 (2003).
pubmed: 14704198
pmcid: 1462870
Scheet, P. & Stephens, M. A fast and exible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am. J. Hum. Genet. 78, 629–644 (2006).
pubmed: 16532393
pmcid: 1424677
Marchini, J. & Howie, B. Genotype imputation for genome-wide association studies. Nat. Rev. Genet. 11, 499–511 (2010).
pubmed: 20517342
O’Connell, J. et al. Haplotype estimation for biobank scale datasets. Nat. Genet. 48, 817–820 (2016).
pubmed: 27270105
pmcid: 4926957
Sesia, M., Sabatti, C. & Candès, E. J. Gene hunting with hidden Markov model knockoffs. Biometrika 106, 1–18 (2019).
pubmed: 30799875
Bottolo, L. & Richardson, S. Discussion of gene hunting with hidden Markov model knockoffs. Biometrika 106, 19–22 (2019).
Jewell, S. W. & Witten, D. M. Discussion of gene hunting with hidden Markov model knockoffs. Biometrika 106, 23–26 (2019).
pubmed: 30799876
pmcid: 6373413
Rosenblatt, J. D., Ritov, Y. & Goeman, J. J. Discussion of gene hunting with hidden Markov model knockoffs. Biometrika 106, 29–33 (2019).
Marchini, J. L. Discussion of gene hunting with hidden Markov model knockoffs. Biometrika 106, 27–28 (2019).
Sesia, M., Sabatti, C. & Candès, E. J. Rejoinder: Gene hunting with hidden Markov model knockoffs. Biometrika 106, 35–45 (2019).
Bycroft, C. et al. The UK biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
pubmed: 6786975
pmcid: 6786975
C., Sabatti, Multivariate linear models for gwas. in Advances in Statistical Bioinformatics: Models and Integrative Inference for High-Throughput Data 188–207 (Cambridge University Press, 2013).
I., Davidson & S.S., Ravi Agglomerative hierarchical clustering with constraints: theoretical and empirical results. in Knowledge Discovery in Databases 59–70 (Springer, Berlin, Heidelberg, 2005).
Weller, J. I., Song, J. Z., Heyen, D. W., Lewin, H. A. & Ron, M. A new approach to the problem of multiple comparisons in the genetic dissection of complex traits. Genetics 150, 1699–1706 (1998).
pubmed: 9832544
pmcid: 1460417
Sabatti, C., Service, S. & Freimer, N. False discovery rate in linkage and association genome screens for complex disorders. Genetics 164, 829–833 (2003).
pubmed: 12807801
pmcid: 1462572
Brzyski, D. et al. Controlling the rate of GWAS false discoveries. Genetics 205, 61–75 (2017).
pubmed: 27784720
Barber, R. F. & Candès, E. J. Controlling the false discovery rate via knockoffs. Ann. Stat. 43, 2055–2085 (2015).
Dai, R. & Barber, R. F. The knockoff filter for FDR control in group-sparse and multitask regression. J. Mach. Learn. Res. 48, 1851–1859 (2016).
Privé, F., Aschard, H., Ziyatdinov, A. & Blum, M. G. B. Effcient analysis of large-scale genome-wide data with two R, packages: bigstatsr and bigsnpr. Bioinformatics 34, 2781–2787 (2018).
pubmed: 29617937
pmcid: 6084588
Katsevich, E. & Sabatti, C. Multilayer knockoff filter: controlled variable selection at multiple resolutions. Ann. Appl. Stat. 13, 1–33 (2019).
pubmed: 31687060
pmcid: 6827557
Efron, B. Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction (Cambridge University Press, 2010).
Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).
pubmed: 16862161
Klasen, J. R. et al. A multi-marker association method for genome-wide association studies without the need for population structure correction. Nat. Commun. 7, 13299 (2016).
pubmed: 27830750
pmcid: 5109549
E. Katsevich, C. Sabatti, & M., Bogomolov, Controlling FDR while highlighting distinct discoveries. Preprint at https://arxiv.org/abs/1809.01792 (2018).
McLean, C. Y. et al. GREAT improves functional interpretation of cis-regulatory regions. Nat. Biotech. 28, 495–501 (2010).
Hoggart, C. J., Whittaker, J. C., De Iorio, M. & Balding, D. J. Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies. PLoS Genet. 4, 1–8 (2008).
Guan, Y. & Stephens, M. Bayesian variable selection regression for genome-wide association studies and other large-scale problems. Ann. Appl. Stat. 5, 1780–1815 (2011).
Buzdugan, L. et al. Assessing statistical significance in multivariable genome wide association analysis. Bioinformatics 32, 1990–2000 (2016).
pubmed: 27153677
pmcid: 4920127
Renaux, C., Buzdugan, L., Kalisch, M. & Bühlmann, P. Hierarchical inference for genome-wide association studies: a view on methodology with software. Comput. Stat. 45, 1–40 (2020).
Zhou, W. et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet. 50, 1335–1341 (2018).
pubmed: 30104761
pmcid: 6119127
Wu, T. T., Chen, Y. F., Hastie, T., Sobel, E. & Lange, K. Genome-wide association analysis by lasso penalized logistic regression. Bioinformatics 25, 714–721 (2009).
pubmed: 19176549
pmcid: 2732298
Wu, J., Devlin, B., Ringquist, S., Trucco, M. & Roeder, K. Screen and clean: a tool for identifying interactions in genome-wide association studies. Genet. Epidemiol. 34, 275–285 (2010).
pubmed: 20088021
pmcid: 2915560
Ernst, J. & Kellis, M. Discovery and characterization of chromatin states for systematic annotation of the human genome. Nat. Biotechnol. 28, 817–825 (2010).
pubmed: 20657582
pmcid: 2919626
Pruim, R. J. et al. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics 26, 2336–2337 (2010).
pubmed: 20634204
pmcid: 2935401