Yield of genetic association signals from genomes, exomes and imputation in the UK Biobank.


Journal

Nature genetics
ISSN: 1546-1718
Titre abrégé: Nat Genet
Pays: United States
ID NLM: 9216904

Informations de publication

Date de publication:
25 Sep 2024
Historique:
received: 05 09 2023
accepted: 23 08 2024
medline: 26 9 2024
pubmed: 26 9 2024
entrez: 25 9 2024
Statut: aheadofprint

Résumé

Whole-genome sequencing (WGS), whole-exome sequencing (WES) and array genotyping with imputation (IMP) are common strategies for assessing genetic variation and its association with medically relevant phenotypes. To date, there has been no systematic empirical assessment of the yield of these approaches when applied to hundreds of thousands of samples to enable the discovery of complex trait genetic signals. Using data for 100 complex traits from 149,195 individuals in the UK Biobank, we systematically compare the relative yield of these strategies in genetic association studies. We find that WGS and WES combined with arrays and imputation (WES + IMP) have the largest association yield. Although WGS results in an approximately fivefold increase in the total number of assayed variants over WES + IMP, the number of detected signals differed by only 1% for both single-variant and gene-based association analyses. Given that WES + IMP typically results in savings of lab and computational time and resources expended per sample, we evaluate the potential benefits of applying WES + IMP to larger samples. When we extend our WES + IMP analyses to 468,169 UK Biobank individuals, we observe an approximately fourfold increase in association signals with the threefold increase in sample size. We conclude that prioritizing WES + IMP and large sample sizes rather than contemporary short-read WGS alternatives will maximize the number of discoveries in genetic association studies.

Identifiants

pubmed: 39322778
doi: 10.1038/s41588-024-01930-4
pii: 10.1038/s41588-024-01930-4
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Informations de copyright

© 2024. The Author(s).

Références

Sabatine, M. S. et al. Evolocumab and clinical outcomes in patients with cardiovascular disease. N. Engl. J. Med. 376, 1713–1722 (2017).
pubmed: 28304224 doi: 10.1056/NEJMoa1615664
Cohen, J. C. et al. Sequence variations in PCSK9, low LDL and protection against coronary heart disease. N. Engl. J. Med. 354, 1264–1272 (2006).
pubmed: 16554528 doi: 10.1056/NEJMoa054013
Cohen, J. et al. Low LDL cholesterol in individuals of African descent resulting from frequent nonsense mutations in PCSK9. Nat. Genet. 37, 161–165 (2005).
pubmed: 15654334 doi: 10.1038/ng1509
Gaudet, D. et al. ANGPTL3 inhibition in homozygous familial hypercholesterolemia. N. Engl. J. Med. 377, 296–297 (2017).
pubmed: 28723334 doi: 10.1056/NEJMc1705994
Frangoul, H. et al. CRISPR–Cas9 gene editing for sickle cell disease and β-thalassemia. N. Engl. J. Med. 384, 252–260 (2021).
pubmed: 33283989 doi: 10.1056/NEJMoa2031054
Uda, M. et al. Genome-wide association study shows BCL11A associated with persistent fetal hemoglobin and amelioration of the phenotype of β-thalassemia. Proc. Natl Acad. Sci. USA 105, 1620–1625 (2008).
pubmed: 18245381 pmcid: 2234194 doi: 10.1073/pnas.0711566105
McCarthy, M. I. et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat. Rev. Genet. 9, 356–369 (2008).
pubmed: 18398418 doi: 10.1038/nrg2344
Abdellaoui, A., Yengo, Y., Verweij, K. J. H. & Visscher, P. M. 15 years of GWAS discovery: realizing the promise. Am. J. Hum. Genet. 110, 179–194 (2023).
pubmed: 36634672 pmcid: 9943775 doi: 10.1016/j.ajhg.2022.12.011
Duerr, R. H. et al. A genome-wide association study identifies IL23R as an inflammatory bowel disease gene. Science 314, 1461–1463 (2006).
pubmed: 17068223 pmcid: 4410764 doi: 10.1126/science.1135245
Rioux, J. D. et al. Genome-wide association study identifies new susceptibility loci for Crohn disease and implicates autophagy in disease pathogenesis. Nat. Genet. 39, 596–604 (2007).
pubmed: 17435756 pmcid: 2757939 doi: 10.1038/ng2032
Schizophrenia Working Group of the Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).
pmcid: 4112379 doi: 10.1038/nature13595
Sollis, E. et al. The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource. Nucleic Acids Res. 51, D977–D985 (2023).
pubmed: 36350656 doi: 10.1093/nar/gkac1010
Hanks, S. C. et al. Extent to which array genotyping and imputation with large reference panels approximate deep whole-genome sequencing. Am. J. Hum. Genet. 109, 1653–1666 (2022).
pubmed: 35981533 pmcid: 9502057 doi: 10.1016/j.ajhg.2022.07.012
Horowitz, J. E. et al. Genome-wide analysis provides genetic evidence that ACE2 influences COVID-19 risk and yields risk scores associated with severe disease. Nat. Genet. 54, 382–392 (2022).
pubmed: 35241825 pmcid: 9005345 doi: 10.1038/s41588-021-01006-7
Gaziano, L. et al. Actionable druggable genome-wide Mendelian randomization identifies repurposing opportunities for COVID-19. Nat. Med. 27, 668–676 (2021).
pubmed: 33837377 pmcid: 7612986 doi: 10.1038/s41591-021-01310-z
Edwards, S. L. et al. Beyond GWASs: illuminating the dark road from association to function. Am. J. Hum. Genet. 93, 779–797 (2013).
pubmed: 24210251 pmcid: 3824120 doi: 10.1016/j.ajhg.2013.10.012
Chong, J. X. et al. The genetic basis of Mendelian phenotypes: discoveries, challenges and opportunities. Am. J. Hum. Genet. 97, 199–215 (2015).
pubmed: 26166479 pmcid: 4573249 doi: 10.1016/j.ajhg.2015.06.009
Akbari, P. et al. Sequencing of 640,000 exomes identifies GPR75 variants associated with protection from obesity. Science 373, eabf8683 (2021).
pubmed: 34210852 pmcid: 10275396 doi: 10.1126/science.abf8683
Verweij, N. et al. Germline mutations in CIDEB and protection against liver disease. N. Engl. J. Med. 387, 332–344 (2022).
pubmed: 35939579 doi: 10.1056/NEJMoa2117872
Ewans, L. J. et al. Whole exome and genome sequencing in mendelian disorders: a diagnostic and health economic analysis. Eur. J. Hum. Genet. 30, 1121–1131 (2022).
pubmed: 35970915 pmcid: 9553973 doi: 10.1038/s41431-022-01162-2
Halldorsson, B. V. et al. The sequences of 150,119 genomes in the UK Biobank. Nature 607, 732–740 (2022).
pubmed: 35859178 pmcid: 9329122 doi: 10.1038/s41586-022-04965-x
Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590, 290–299 (2021).
pubmed: 33568819 pmcid: 7875770 doi: 10.1038/s41586-021-03205-y
All of Us Research Program Investigators. The ‘All of Us’ research program. N. Engl. J. Med. 381, 668–676 (2019).
Cirulli, E. T. & Goldstein, D. B. Uncovering the roles of rare variants in common disease through whole-genome sequencing. Nat. Rev. Genet. 11, 415–425 (2010).
pubmed: 20479773 doi: 10.1038/nrg2779
Need, A. C. & Goldstein, D. B. Whole genome association studies in complex diseases: where do we stand? Dialogues Clin. Neurosci. 12, 37–46 (2010).
pubmed: 20373665 pmcid: 3181943 doi: 10.31887/DCNS.2010.12.1/aneed
National Human Genome Research Institute. The Cost of Sequencing a Human Genome https://www.genome.gov/sequencingcosts (National Human Genome Research Institute, 2021).
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
pubmed: 30305743 pmcid: 6786975 doi: 10.1038/s41586-018-0579-z
van Hout, C. V. et al. Exome sequencing and characterization of 49,960 individuals in the UK Biobank. Nature 586, 749–756 (2020).
pubmed: 33087929 pmcid: 7759458 doi: 10.1038/s41586-020-2853-0
Backman, J. D. et al. Exome sequencing and analysis of 454,787 UK Biobank participants. Nature 599, 628–634 (2021).
pubmed: 34662886 pmcid: 8596853 doi: 10.1038/s41586-021-04103-z
Das, S. et al. Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284–1287 (2016).
pubmed: 27571263 pmcid: 5157836 doi: 10.1038/ng.3656
Rubinacci, S., Delaneau, O. & Marchini, J. Genotype imputation using the positional burrows wheeler transform. PLoS Genet. 16, e1009049 (2020).
pubmed: 33196638 pmcid: 7704051 doi: 10.1371/journal.pgen.1009049
Mbatchou, J. et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nat. Genet. 53, 1097–1103 (2021).
pubmed: 34017140 doi: 10.1038/s41588-021-00870-7
Ziyatdinov, A. et al. Joint testing of rare variant burden scores using non-negative least squares. Preprint at https://doi.org/10.1101/2023.02.22.529560 (2023).
Watanabe, K. et al. A global overview of pleiotropy and genetic architecture in complex traits. Nat. Genet. 51, 1339–1348 (2019).
pubmed: 31427789 doi: 10.1038/s41588-019-0481-0
Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014).
pubmed: 24316577 doi: 10.1093/nar/gkt1229
Manolio, T. A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009).
pubmed: 19812666 pmcid: 2831613 doi: 10.1038/nature08494
Ochoa, D. et al. Human genetics evidence supports two-thirds of the 2021 FDA-approved drugs. Nat. Rev. Drug Discov. 21, 551 (2022).
pubmed: 35804044 doi: 10.1038/d41573-022-00120-3
Kurki, M. I. et al. FinnGen provides genetic insights from a well-phenotyped isolated population. Nature 613, 508–518 (2023).
pubmed: 36653562 pmcid: 9849126 doi: 10.1038/s41586-022-05473-8
Shi, S. et al. A Genomics England haplotype reference panel and imputation of UK Biobank. Nat. Genet. https://doi.org/10.1038/s41588-024-01868-7 (2024).
doi: 10.1038/s41588-024-01868-7 pubmed: 39187616 pmcid: 11387195
Ziyatdinov, A. et al. Genotyping, sequencing and analysis of 140,000 adults from Mexico City. Nature 622, 784–793 (2023).
pubmed: 37821707 pmcid: 10600010 doi: 10.1038/s41586-023-06595-3
Edgar, R., Domrachev, M. & Lash, A. E. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30, 207–210 (2002).
pubmed: 11752295 pmcid: 99122 doi: 10.1093/nar/30.1.207
Barrett, T. et al. NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res. 41, D991–D995 (2012).
pubmed: 23193258 pmcid: 3531084 doi: 10.1093/nar/gks1193
GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
doi: 10.1126/science.aaz1776
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
doi: 10.1038/nature11247
Welsh, S. et al. Comparison of DNA quantification methodology used in the DNA extraction protocol for the UK Biobank cohort. BMC Genomics 18, 26 (2017).
pubmed: 28056765 pmcid: 5217214 doi: 10.1186/s12864-016-3391-x
McLaren, W. et al. The Ensembl variant effect predictor. Genome Biol. 17, 122 (2016).
pubmed: 27268795 pmcid: 4893825 doi: 10.1186/s13059-016-0974-4
Liu, X. et al. dbNSFP v4: a comprehensive database of transcript-specific functional predictions and annotations for human nonsynonymous and splice-site SNVs. Genome Med. 12, 103 (2020).
pubmed: 33261662 pmcid: 7709417 doi: 10.1186/s13073-020-00803-9
Kumar, P., Henikoff, S. & Ng, P. C. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat. Protoc. 4, 1073–1081 (2009).
pubmed: 19561590 doi: 10.1038/nprot.2009.86
Adzhubei, I. A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010).
pubmed: 20354512 pmcid: 2855889 doi: 10.1038/nmeth0410-248
Chun, S. & Fay, J. C. Identification of deleterious mutations within three human genomes. Genome Res. 19, 1553–1561 (2009).
pubmed: 19602639 pmcid: 2752137 doi: 10.1101/gr.092619.109
Schwarz, J. M. et al. MutationTaster evaluates disease-causing potential of sequence alterations. Nat. Methods 7, 575–576 (2010).
pubmed: 20676075 doi: 10.1038/nmeth0810-575
Gaynor, S. M. & Joseph, T. rgcgithub/ukb_genetic_association_yield: v1.0. Zenodo https://doi.org/10.5281/zenodo.13357248 (2024).

Auteurs

Sheila M Gaynor (SM)

Regeneron Genetics Center, Tarrytown, NY, USA. sheila.gaynor@regeneron.com.

Tyler Joseph (T)

Regeneron Genetics Center, Tarrytown, NY, USA.

Xiaodong Bai (X)

Regeneron Genetics Center, Tarrytown, NY, USA.

Yuxin Zou (Y)

Regeneron Genetics Center, Tarrytown, NY, USA.

Boris Boutkov (B)

Regeneron Genetics Center, Tarrytown, NY, USA.

Evan K Maxwell (EK)

Regeneron Genetics Center, Tarrytown, NY, USA.

Olivier Delaneau (O)

Regeneron Genetics Center, Tarrytown, NY, USA.

Robin J Hofmeister (RJ)

Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.

Olga Krasheninina (O)

Regeneron Genetics Center, Tarrytown, NY, USA.

Suganthi Balasubramanian (S)

Regeneron Genetics Center, Tarrytown, NY, USA.

Anthony Marcketta (A)

Regeneron Genetics Center, Tarrytown, NY, USA.

Joshua Backman (J)

Regeneron Genetics Center, Tarrytown, NY, USA.

Jeffrey G Reid (JG)

Regeneron Genetics Center, Tarrytown, NY, USA.

John D Overton (JD)

Regeneron Genetics Center, Tarrytown, NY, USA.

Luca A Lotta (LA)

Regeneron Genetics Center, Tarrytown, NY, USA.

Jonathan Marchini (J)

Regeneron Genetics Center, Tarrytown, NY, USA.

William J Salerno (WJ)

Regeneron Genetics Center, Tarrytown, NY, USA.

Aris Baras (A)

Regeneron Genetics Center, Tarrytown, NY, USA. aris.baras@regeneron.com.

Goncalo R Abecasis (GR)

Regeneron Genetics Center, Tarrytown, NY, USA. goncalo.abecasis@regeneron.com.

Timothy A Thornton (TA)

Regeneron Genetics Center, Tarrytown, NY, USA. timothy.thornton@regeneron.com.

Classifications MeSH