Multi-resolution localization of causal variants across the genome.

Algorithms Chromosome Mapping / methods Datasets as Topic Feasibility Studies Genome, Human / genetics Genome-Wide Association Study / methods Genomics / methods Humans Linkage Disequilibrium Models, Genetic Multifactorial Inheritance / genetics Polymorphism, Single Nucleotide / genetics Quantitative Trait Loci / genetics Software

Journal

Nature communications

ISSN: 2041-1723

Titre abrégé: Nat Commun

Pays: England

ID NLM: 101528555

Informations de publication

Date de publication:
27 02 2020

Historique:

received: 17 06 2019

accepted: 01 02 2020

entrez: 29 2 2020

pubmed: 29 2 2020

medline: 6 5 2020

Statut: epublish

Résumé

In the statistical analysis of genome-wide association data, it is challenging to precisely localize the variants that affect complex traits, due to linkage disequilibrium, and to maximize power while limiting spurious findings. Here we report on KnockoffZoom: a flexible method that localizes causal variants at multiple resolutions by testing the conditional associations of genetic segments of decreasing width, while provably controlling the false discovery rate. Our method utilizes artificial genotypes as negative controls and is equally valid for quantitative and binary phenotypes, without requiring any assumptions about their genetic architectures. Instead, we rely on well-established genetic models of linkage disequilibrium. We demonstrate that our method can detect more associations than mixed effects models and achieve fine-mapping precision, at comparable computational cost. Lastly, we apply KnockoffZoom to data from 350k subjects in the UK Biobank and report many new findings.

Identifiants

DOI: 10.1038/s41467-020-14791-2 PMID: 32107378 PMC: PMC7046731

pubmed: 32107378

doi: 10.1038/s41467-020-14791-2

pii: 10.1038/s41467-020-14791-2

pmc: PMC7046731

doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't Research Support, U.S. Gov't, Non-P.H.S.

Langues

eng

Sous-ensembles de citation

Pagination

1093

Commentaires et corrections

Type : ErratumIn

Références

Visscher, P. M. et al. 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 101, 5–22 (2017).

pubmed: 28686856 pmcid: 5501872

Tam, V. et al. Benefits and limitations of genome-wide association studies. Nat. Rev. Genet. 20, 467–484 (2019).

pubmed: 31068683

Yu, J. et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. 38, 203–208 (2006).

pubmed: 16380716

Kang, H. M. et al. Efficient control of population structure in model organism association mapping. Genetics 178, 1709–1723 (2008).

pubmed: 18385116 pmcid: 2278096

Kang, H. M. et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010).

pubmed: 20208533 pmcid: 3092069

Zhang, Z. et al. Mixed linear model approach adapted for genome-wide association studies. Nat. Genet. 42, 355–360 (2010).

pubmed: 20208535 pmcid: 2931336

Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284 (2015).

pubmed: 4342297 pmcid: 4342297

Loh, P.-R., Kichaev, G., Gazal, S., Schoech, A. P. & Price, A. L. Mixed-model association for biobankscale datasets. Nat. Genet. 50, 906–908 (2018).

pubmed: 6309610 pmcid: 6309610

Reich, D. E. et al. Linkage disequilibrium in the human genome. Nature 411, 199–204 (2001).

pubmed: 11346797

Pritchard, J. K. & Przeworski, M. Linkage disequilibrium in humans: models and data. Am. J. Hum. Genet. 69, 1–14 (2001).

pubmed: 11410837 pmcid: 1226024

Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).

pubmed: 28622505 pmcid: 5536862

Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

pubmed: 17701901 pmcid: 17701901

Schaid, D. J., Chen, W. & Larson, N. B. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat. Rev. Genet. 19, 491–504 (2018).

pubmed: 29844615 pmcid: 6050137

Hormozdiari, F., Kostem, E., Kang, E. Y., Pasaniuc, B. & Eskin, E. Identifying causal variants at loci with multiple signals of association. Genetics 198, 497–508 (2014).

pubmed: 25104515 pmcid: 4196608

Kichaev, G. et al. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet. 10, 1–16 (2014).

Benner, C. et al. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493–1501 (2016).

pubmed: 26773131 pmcid: 4866522

Wang, G., Sarkar, A.K., P., Carbonetto & M., Stephens A simple new approach to variable selection in regression, with application to genetic fine-mapping. Preprint at https://doi.org/10.1101/501114 (2018).

Candès, E. J., Fan, Y., Janson, L. & Lv, J. Panning for gold: model-x knockoffs for high-dimensional controlled variable selection. J. R. Stat. Soc. B 80, 551–577 (2018).

Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995).

Li, N. & Stephens, M. Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165, 2213–2233 (2003).

pubmed: 14704198 pmcid: 1462870

Scheet, P. & Stephens, M. A fast and exible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am. J. Hum. Genet. 78, 629–644 (2006).

pubmed: 16532393 pmcid: 1424677

Marchini, J. & Howie, B. Genotype imputation for genome-wide association studies. Nat. Rev. Genet. 11, 499–511 (2010).

pubmed: 20517342

O’Connell, J. et al. Haplotype estimation for biobank scale datasets. Nat. Genet. 48, 817–820 (2016).

pubmed: 27270105 pmcid: 4926957

Sesia, M., Sabatti, C. & Candès, E. J. Gene hunting with hidden Markov model knockoffs. Biometrika 106, 1–18 (2019).

pubmed: 30799875

Bottolo, L. & Richardson, S. Discussion of gene hunting with hidden Markov model knockoffs. Biometrika 106, 19–22 (2019).

Jewell, S. W. & Witten, D. M. Discussion of gene hunting with hidden Markov model knockoffs. Biometrika 106, 23–26 (2019).

pubmed: 30799876 pmcid: 6373413

Rosenblatt, J. D., Ritov, Y. & Goeman, J. J. Discussion of gene hunting with hidden Markov model knockoffs. Biometrika 106, 29–33 (2019).

Marchini, J. L. Discussion of gene hunting with hidden Markov model knockoffs. Biometrika 106, 27–28 (2019).

Sesia, M., Sabatti, C. & Candès, E. J. Rejoinder: Gene hunting with hidden Markov model knockoffs. Biometrika 106, 35–45 (2019).

Bycroft, C. et al. The UK biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).

pubmed: 6786975 pmcid: 6786975

C., Sabatti, Multivariate linear models for gwas. in Advances in Statistical Bioinformatics: Models and Integrative Inference for High-Throughput Data 188–207 (Cambridge University Press, 2013).

I., Davidson & S.S., Ravi Agglomerative hierarchical clustering with constraints: theoretical and empirical results. in Knowledge Discovery in Databases 59–70 (Springer, Berlin, Heidelberg, 2005).

Weller, J. I., Song, J. Z., Heyen, D. W., Lewin, H. A. & Ron, M. A new approach to the problem of multiple comparisons in the genetic dissection of complex traits. Genetics 150, 1699–1706 (1998).

pubmed: 9832544 pmcid: 1460417

Sabatti, C., Service, S. & Freimer, N. False discovery rate in linkage and association genome screens for complex disorders. Genetics 164, 829–833 (2003).

pubmed: 12807801 pmcid: 1462572

Brzyski, D. et al. Controlling the rate of GWAS false discoveries. Genetics 205, 61–75 (2017).

pubmed: 27784720

Barber, R. F. & Candès, E. J. Controlling the false discovery rate via knockoffs. Ann. Stat. 43, 2055–2085 (2015).

Dai, R. & Barber, R. F. The knockoff filter for FDR control in group-sparse and multitask regression. J. Mach. Learn. Res. 48, 1851–1859 (2016).

Privé, F., Aschard, H., Ziyatdinov, A. & Blum, M. G. B. Effcient analysis of large-scale genome-wide data with two R, packages: bigstatsr and bigsnpr. Bioinformatics 34, 2781–2787 (2018).

pubmed: 29617937 pmcid: 6084588

Katsevich, E. & Sabatti, C. Multilayer knockoff filter: controlled variable selection at multiple resolutions. Ann. Appl. Stat. 13, 1–33 (2019).

pubmed: 31687060 pmcid: 6827557

Efron, B. Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction (Cambridge University Press, 2010).

Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).

pubmed: 16862161

Klasen, J. R. et al. A multi-marker association method for genome-wide association studies without the need for population structure correction. Nat. Commun. 7, 13299 (2016).

pubmed: 27830750 pmcid: 5109549

E. Katsevich, C. Sabatti, & M., Bogomolov, Controlling FDR while highlighting distinct discoveries. Preprint at https://arxiv.org/abs/1809.01792 (2018).

McLean, C. Y. et al. GREAT improves functional interpretation of cis-regulatory regions. Nat. Biotech. 28, 495–501 (2010).

Hoggart, C. J., Whittaker, J. C., De Iorio, M. & Balding, D. J. Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies. PLoS Genet. 4, 1–8 (2008).

Guan, Y. & Stephens, M. Bayesian variable selection regression for genome-wide association studies and other large-scale problems. Ann. Appl. Stat. 5, 1780–1815 (2011).

Buzdugan, L. et al. Assessing statistical significance in multivariable genome wide association analysis. Bioinformatics 32, 1990–2000 (2016).

pubmed: 27153677 pmcid: 4920127

Renaux, C., Buzdugan, L., Kalisch, M. & Bühlmann, P. Hierarchical inference for genome-wide association studies: a view on methodology with software. Comput. Stat. 45, 1–40 (2020).

Zhou, W. et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet. 50, 1335–1341 (2018).

pubmed: 30104761 pmcid: 6119127

Wu, T. T., Chen, Y. F., Hastie, T., Sobel, E. & Lange, K. Genome-wide association analysis by lasso penalized logistic regression. Bioinformatics 25, 714–721 (2009).

pubmed: 19176549 pmcid: 2732298

Wu, J., Devlin, B., Ringquist, S., Trucco, M. & Roeder, K. Screen and clean: a tool for identifying interactions in genome-wide association studies. Genet. Epidemiol. 34, 275–285 (2010).

pubmed: 20088021 pmcid: 2915560

Ernst, J. & Kellis, M. Discovery and characterization of chromatin states for systematic annotation of the human genome. Nat. Biotechnol. 28, 817–825 (2010).

pubmed: 20657582 pmcid: 2919626

Pruim, R. J. et al. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics 26, 2336–2337 (2010).

pubmed: 20634204 pmcid: 2935401

Multi-resolution localization of causal variants across the genome.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Sous-ensembles de citation

Pagination

Commentaires et corrections

Références

Auteurs

Matteo Sesia (M)

Eugene Katsevich (E)

Stephen Bates (S)

Emmanuel Candès (E)

Chiara Sabatti (C)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH