Multi-resolution localization of causal variants across the genome.


Journal

Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555

Informations de publication

Date de publication:
27 02 2020
Historique:
received: 17 06 2019
accepted: 01 02 2020
entrez: 29 2 2020
pubmed: 29 2 2020
medline: 6 5 2020
Statut: epublish

Résumé

In the statistical analysis of genome-wide association data, it is challenging to precisely localize the variants that affect complex traits, due to linkage disequilibrium, and to maximize power while limiting spurious findings. Here we report on KnockoffZoom: a flexible method that localizes causal variants at multiple resolutions by testing the conditional associations of genetic segments of decreasing width, while provably controlling the false discovery rate. Our method utilizes artificial genotypes as negative controls and is equally valid for quantitative and binary phenotypes, without requiring any assumptions about their genetic architectures. Instead, we rely on well-established genetic models of linkage disequilibrium. We demonstrate that our method can detect more associations than mixed effects models and achieve fine-mapping precision, at comparable computational cost. Lastly, we apply KnockoffZoom to data from 350k subjects in the UK Biobank and report many new findings.

Identifiants

pubmed: 32107378
doi: 10.1038/s41467-020-14791-2
pii: 10.1038/s41467-020-14791-2
pmc: PMC7046731
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't Research Support, U.S. Gov't, Non-P.H.S.

Langues

eng

Sous-ensembles de citation

IM

Pagination

1093

Commentaires et corrections

Type : ErratumIn

Références

Visscher, P. M. et al. 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 101, 5–22 (2017).
pubmed: 28686856 pmcid: 5501872
Tam, V. et al. Benefits and limitations of genome-wide association studies. Nat. Rev. Genet. 20, 467–484 (2019).
pubmed: 31068683
Yu, J. et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. 38, 203–208 (2006).
pubmed: 16380716
Kang, H. M. et al. Efficient control of population structure in model organism association mapping. Genetics 178, 1709–1723 (2008).
pubmed: 18385116 pmcid: 2278096
Kang, H. M. et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010).
pubmed: 20208533 pmcid: 3092069
Zhang, Z. et al. Mixed linear model approach adapted for genome-wide association studies. Nat. Genet. 42, 355–360 (2010).
pubmed: 20208535 pmcid: 2931336
Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284 (2015).
pubmed: 4342297 pmcid: 4342297
Loh, P.-R., Kichaev, G., Gazal, S., Schoech, A. P. & Price, A. L. Mixed-model association for biobankscale datasets. Nat. Genet. 50, 906–908 (2018).
pubmed: 6309610 pmcid: 6309610
Reich, D. E. et al. Linkage disequilibrium in the human genome. Nature 411, 199–204 (2001).
pubmed: 11346797
Pritchard, J. K. & Przeworski, M. Linkage disequilibrium in humans: models and data. Am. J. Hum. Genet. 69, 1–14 (2001).
pubmed: 11410837 pmcid: 1226024
Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).
pubmed: 28622505 pmcid: 5536862
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
pubmed: 17701901 pmcid: 17701901
Schaid, D. J., Chen, W. & Larson, N. B. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat. Rev. Genet. 19, 491–504 (2018).
pubmed: 29844615 pmcid: 6050137
Hormozdiari, F., Kostem, E., Kang, E. Y., Pasaniuc, B. & Eskin, E. Identifying causal variants at loci with multiple signals of association. Genetics 198, 497–508 (2014).
pubmed: 25104515 pmcid: 4196608
Kichaev, G. et al. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet. 10, 1–16 (2014).
Benner, C. et al. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493–1501 (2016).
pubmed: 26773131 pmcid: 4866522
Wang,  G., Sarkar,  A.K., P.,  Carbonetto &  M.,  Stephens  A simple new approach to variable selection in regression, with application to genetic fine-mapping. Preprint at https://doi.org/10.1101/501114 (2018).
Candès, E. J., Fan, Y., Janson, L. & Lv, J. Panning for gold: model-x knockoffs for high-dimensional controlled variable selection. J. R. Stat. Soc. B 80, 551–577 (2018).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995).
Li, N. & Stephens, M. Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165, 2213–2233 (2003).
pubmed: 14704198 pmcid: 1462870
Scheet, P. & Stephens, M. A fast and exible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am. J. Hum. Genet. 78, 629–644 (2006).
pubmed: 16532393 pmcid: 1424677
Marchini, J. & Howie, B. Genotype imputation for genome-wide association studies. Nat. Rev. Genet. 11, 499–511 (2010).
pubmed: 20517342
O’Connell, J. et al. Haplotype estimation for biobank scale datasets. Nat. Genet. 48, 817–820 (2016).
pubmed: 27270105 pmcid: 4926957
Sesia, M., Sabatti, C. & Candès, E. J. Gene hunting with hidden Markov model knockoffs. Biometrika 106, 1–18 (2019).
pubmed: 30799875
Bottolo, L. & Richardson, S. Discussion of gene hunting with hidden Markov model knockoffs. Biometrika 106, 19–22 (2019).
Jewell, S. W. & Witten, D. M. Discussion of gene hunting with hidden Markov model knockoffs. Biometrika 106, 23–26 (2019).
pubmed: 30799876 pmcid: 6373413
Rosenblatt, J. D., Ritov, Y. & Goeman, J. J. Discussion of gene hunting with hidden Markov model knockoffs. Biometrika 106, 29–33 (2019).
Marchini, J. L. Discussion of gene hunting with hidden Markov model knockoffs. Biometrika 106, 27–28 (2019).
Sesia, M., Sabatti, C. & Candès, E. J. Rejoinder: Gene hunting with hidden Markov model knockoffs. Biometrika 106, 35–45 (2019).
Bycroft, C. et al. The UK biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
pubmed: 6786975 pmcid: 6786975
C.,  Sabatti,  Multivariate linear models for gwas. in Advances in Statistical Bioinformatics: Models and Integrative Inference for High-Throughput Data 188–207 (Cambridge University Press, 2013).
I.,  Davidson &  S.S.,  Ravi  Agglomerative hierarchical clustering with constraints: theoretical and empirical results. in Knowledge Discovery in Databases 59–70 (Springer, Berlin, Heidelberg, 2005).
Weller, J. I., Song, J. Z., Heyen, D. W., Lewin, H. A. & Ron, M. A new approach to the problem of multiple comparisons in the genetic dissection of complex traits. Genetics 150, 1699–1706 (1998).
pubmed: 9832544 pmcid: 1460417
Sabatti, C., Service, S. & Freimer, N. False discovery rate in linkage and association genome screens for complex disorders. Genetics 164, 829–833 (2003).
pubmed: 12807801 pmcid: 1462572
Brzyski, D. et al. Controlling the rate of GWAS false discoveries. Genetics 205, 61–75 (2017).
pubmed: 27784720
Barber, R. F. & Candès, E. J. Controlling the false discovery rate via knockoffs. Ann. Stat. 43, 2055–2085 (2015).
Dai, R. & Barber, R. F. The knockoff filter for FDR control in group-sparse and multitask regression. J. Mach. Learn. Res. 48, 1851–1859 (2016).
Privé, F., Aschard, H., Ziyatdinov, A. & Blum, M. G. B. Effcient analysis of large-scale genome-wide data with two R, packages: bigstatsr and bigsnpr. Bioinformatics 34, 2781–2787 (2018).
pubmed: 29617937 pmcid: 6084588
Katsevich, E. & Sabatti, C. Multilayer knockoff filter: controlled variable selection at multiple resolutions. Ann. Appl. Stat. 13, 1–33 (2019).
pubmed: 31687060 pmcid: 6827557
Efron, B.   Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction (Cambridge University Press, 2010).
Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).
pubmed: 16862161
Klasen, J. R. et al. A multi-marker association method for genome-wide association studies without the need for population structure correction. Nat. Commun. 7, 13299 (2016).
pubmed: 27830750 pmcid: 5109549
E.  Katsevich,  C.  Sabatti,   &  M.,  Bogomolov,  Controlling FDR while highlighting distinct discoveries. Preprint at https://arxiv.org/abs/1809.01792 (2018).
McLean, C. Y. et al. GREAT improves functional interpretation of cis-regulatory regions. Nat. Biotech. 28, 495–501 (2010).
Hoggart, C. J., Whittaker, J. C., De Iorio, M. & Balding, D. J. Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies. PLoS Genet. 4, 1–8 (2008).
Guan, Y. & Stephens, M. Bayesian variable selection regression for genome-wide association studies and other large-scale problems. Ann. Appl. Stat. 5, 1780–1815 (2011).
Buzdugan, L. et al. Assessing statistical significance in multivariable genome wide association analysis. Bioinformatics 32, 1990–2000 (2016).
pubmed: 27153677 pmcid: 4920127
Renaux, C., Buzdugan, L., Kalisch, M. & Bühlmann, P. Hierarchical inference for genome-wide association studies: a view on methodology with software. Comput. Stat. 45, 1–40 (2020).
Zhou, W. et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet. 50, 1335–1341 (2018).
pubmed: 30104761 pmcid: 6119127
Wu, T. T., Chen, Y. F., Hastie, T., Sobel, E. & Lange, K. Genome-wide association analysis by lasso penalized logistic regression. Bioinformatics 25, 714–721 (2009).
pubmed: 19176549 pmcid: 2732298
Wu, J., Devlin, B., Ringquist, S., Trucco, M. & Roeder, K. Screen and clean: a tool for identifying interactions in genome-wide association studies. Genet. Epidemiol. 34, 275–285 (2010).
pubmed: 20088021 pmcid: 2915560
Ernst, J. & Kellis, M. Discovery and characterization of chromatin states for systematic annotation of the human genome. Nat. Biotechnol. 28, 817–825 (2010).
pubmed: 20657582 pmcid: 2919626
Pruim, R. J. et al. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics 26, 2336–2337 (2010).
pubmed: 20634204 pmcid: 2935401

Auteurs

Matteo Sesia (M)

Department of Statistics, Stanford University, Stanford, CA, 94305, USA.

Eugene Katsevich (E)

Department of Statistics, Carnegie Mellon University, Pittsburgh, PA, 15213, USA.

Stephen Bates (S)

Department of Statistics, Stanford University, Stanford, CA, 94305, USA.

Emmanuel Candès (E)

Departments of Mathematics and of Statistics, Stanford University, Stanford, CA, 94305, USA. candes@stanford.edu.

Chiara Sabatti (C)

Departments of Biomedical Data Science and of Statistics, Stanford University, Stanford, CA, 94305, USA. sabatti@stanford.edu.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH