Yield of genetic association signals from genomes, exomes and imputation in the UK Biobank.
Journal
Nature genetics
ISSN: 1546-1718
Titre abrégé: Nat Genet
Pays: United States
ID NLM: 9216904
Informations de publication
Date de publication:
25 Sep 2024
25 Sep 2024
Historique:
received:
05
09
2023
accepted:
23
08
2024
medline:
26
9
2024
pubmed:
26
9
2024
entrez:
25
9
2024
Statut:
aheadofprint
Résumé
Whole-genome sequencing (WGS), whole-exome sequencing (WES) and array genotyping with imputation (IMP) are common strategies for assessing genetic variation and its association with medically relevant phenotypes. To date, there has been no systematic empirical assessment of the yield of these approaches when applied to hundreds of thousands of samples to enable the discovery of complex trait genetic signals. Using data for 100 complex traits from 149,195 individuals in the UK Biobank, we systematically compare the relative yield of these strategies in genetic association studies. We find that WGS and WES combined with arrays and imputation (WES + IMP) have the largest association yield. Although WGS results in an approximately fivefold increase in the total number of assayed variants over WES + IMP, the number of detected signals differed by only 1% for both single-variant and gene-based association analyses. Given that WES + IMP typically results in savings of lab and computational time and resources expended per sample, we evaluate the potential benefits of applying WES + IMP to larger samples. When we extend our WES + IMP analyses to 468,169 UK Biobank individuals, we observe an approximately fourfold increase in association signals with the threefold increase in sample size. We conclude that prioritizing WES + IMP and large sample sizes rather than contemporary short-read WGS alternatives will maximize the number of discoveries in genetic association studies.
Identifiants
pubmed: 39322778
doi: 10.1038/s41588-024-01930-4
pii: 10.1038/s41588-024-01930-4
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Informations de copyright
© 2024. The Author(s).
Références
Sabatine, M. S. et al. Evolocumab and clinical outcomes in patients with cardiovascular disease. N. Engl. J. Med. 376, 1713–1722 (2017).
pubmed: 28304224
doi: 10.1056/NEJMoa1615664
Cohen, J. C. et al. Sequence variations in PCSK9, low LDL and protection against coronary heart disease. N. Engl. J. Med. 354, 1264–1272 (2006).
pubmed: 16554528
doi: 10.1056/NEJMoa054013
Cohen, J. et al. Low LDL cholesterol in individuals of African descent resulting from frequent nonsense mutations in PCSK9. Nat. Genet. 37, 161–165 (2005).
pubmed: 15654334
doi: 10.1038/ng1509
Gaudet, D. et al. ANGPTL3 inhibition in homozygous familial hypercholesterolemia. N. Engl. J. Med. 377, 296–297 (2017).
pubmed: 28723334
doi: 10.1056/NEJMc1705994
Frangoul, H. et al. CRISPR–Cas9 gene editing for sickle cell disease and β-thalassemia. N. Engl. J. Med. 384, 252–260 (2021).
pubmed: 33283989
doi: 10.1056/NEJMoa2031054
Uda, M. et al. Genome-wide association study shows BCL11A associated with persistent fetal hemoglobin and amelioration of the phenotype of β-thalassemia. Proc. Natl Acad. Sci. USA 105, 1620–1625 (2008).
pubmed: 18245381
pmcid: 2234194
doi: 10.1073/pnas.0711566105
McCarthy, M. I. et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat. Rev. Genet. 9, 356–369 (2008).
pubmed: 18398418
doi: 10.1038/nrg2344
Abdellaoui, A., Yengo, Y., Verweij, K. J. H. & Visscher, P. M. 15 years of GWAS discovery: realizing the promise. Am. J. Hum. Genet. 110, 179–194 (2023).
pubmed: 36634672
pmcid: 9943775
doi: 10.1016/j.ajhg.2022.12.011
Duerr, R. H. et al. A genome-wide association study identifies IL23R as an inflammatory bowel disease gene. Science 314, 1461–1463 (2006).
pubmed: 17068223
pmcid: 4410764
doi: 10.1126/science.1135245
Rioux, J. D. et al. Genome-wide association study identifies new susceptibility loci for Crohn disease and implicates autophagy in disease pathogenesis. Nat. Genet. 39, 596–604 (2007).
pubmed: 17435756
pmcid: 2757939
doi: 10.1038/ng2032
Schizophrenia Working Group of the Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).
pmcid: 4112379
doi: 10.1038/nature13595
Sollis, E. et al. The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource. Nucleic Acids Res. 51, D977–D985 (2023).
pubmed: 36350656
doi: 10.1093/nar/gkac1010
Hanks, S. C. et al. Extent to which array genotyping and imputation with large reference panels approximate deep whole-genome sequencing. Am. J. Hum. Genet. 109, 1653–1666 (2022).
pubmed: 35981533
pmcid: 9502057
doi: 10.1016/j.ajhg.2022.07.012
Horowitz, J. E. et al. Genome-wide analysis provides genetic evidence that ACE2 influences COVID-19 risk and yields risk scores associated with severe disease. Nat. Genet. 54, 382–392 (2022).
pubmed: 35241825
pmcid: 9005345
doi: 10.1038/s41588-021-01006-7
Gaziano, L. et al. Actionable druggable genome-wide Mendelian randomization identifies repurposing opportunities for COVID-19. Nat. Med. 27, 668–676 (2021).
pubmed: 33837377
pmcid: 7612986
doi: 10.1038/s41591-021-01310-z
Edwards, S. L. et al. Beyond GWASs: illuminating the dark road from association to function. Am. J. Hum. Genet. 93, 779–797 (2013).
pubmed: 24210251
pmcid: 3824120
doi: 10.1016/j.ajhg.2013.10.012
Chong, J. X. et al. The genetic basis of Mendelian phenotypes: discoveries, challenges and opportunities. Am. J. Hum. Genet. 97, 199–215 (2015).
pubmed: 26166479
pmcid: 4573249
doi: 10.1016/j.ajhg.2015.06.009
Akbari, P. et al. Sequencing of 640,000 exomes identifies GPR75 variants associated with protection from obesity. Science 373, eabf8683 (2021).
pubmed: 34210852
pmcid: 10275396
doi: 10.1126/science.abf8683
Verweij, N. et al. Germline mutations in CIDEB and protection against liver disease. N. Engl. J. Med. 387, 332–344 (2022).
pubmed: 35939579
doi: 10.1056/NEJMoa2117872
Ewans, L. J. et al. Whole exome and genome sequencing in mendelian disorders: a diagnostic and health economic analysis. Eur. J. Hum. Genet. 30, 1121–1131 (2022).
pubmed: 35970915
pmcid: 9553973
doi: 10.1038/s41431-022-01162-2
Halldorsson, B. V. et al. The sequences of 150,119 genomes in the UK Biobank. Nature 607, 732–740 (2022).
pubmed: 35859178
pmcid: 9329122
doi: 10.1038/s41586-022-04965-x
Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590, 290–299 (2021).
pubmed: 33568819
pmcid: 7875770
doi: 10.1038/s41586-021-03205-y
All of Us Research Program Investigators. The ‘All of Us’ research program. N. Engl. J. Med. 381, 668–676 (2019).
Cirulli, E. T. & Goldstein, D. B. Uncovering the roles of rare variants in common disease through whole-genome sequencing. Nat. Rev. Genet. 11, 415–425 (2010).
pubmed: 20479773
doi: 10.1038/nrg2779
Need, A. C. & Goldstein, D. B. Whole genome association studies in complex diseases: where do we stand? Dialogues Clin. Neurosci. 12, 37–46 (2010).
pubmed: 20373665
pmcid: 3181943
doi: 10.31887/DCNS.2010.12.1/aneed
National Human Genome Research Institute. The Cost of Sequencing a Human Genome https://www.genome.gov/sequencingcosts (National Human Genome Research Institute, 2021).
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
pubmed: 30305743
pmcid: 6786975
doi: 10.1038/s41586-018-0579-z
van Hout, C. V. et al. Exome sequencing and characterization of 49,960 individuals in the UK Biobank. Nature 586, 749–756 (2020).
pubmed: 33087929
pmcid: 7759458
doi: 10.1038/s41586-020-2853-0
Backman, J. D. et al. Exome sequencing and analysis of 454,787 UK Biobank participants. Nature 599, 628–634 (2021).
pubmed: 34662886
pmcid: 8596853
doi: 10.1038/s41586-021-04103-z
Das, S. et al. Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284–1287 (2016).
pubmed: 27571263
pmcid: 5157836
doi: 10.1038/ng.3656
Rubinacci, S., Delaneau, O. & Marchini, J. Genotype imputation using the positional burrows wheeler transform. PLoS Genet. 16, e1009049 (2020).
pubmed: 33196638
pmcid: 7704051
doi: 10.1371/journal.pgen.1009049
Mbatchou, J. et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nat. Genet. 53, 1097–1103 (2021).
pubmed: 34017140
doi: 10.1038/s41588-021-00870-7
Ziyatdinov, A. et al. Joint testing of rare variant burden scores using non-negative least squares. Preprint at https://doi.org/10.1101/2023.02.22.529560 (2023).
Watanabe, K. et al. A global overview of pleiotropy and genetic architecture in complex traits. Nat. Genet. 51, 1339–1348 (2019).
pubmed: 31427789
doi: 10.1038/s41588-019-0481-0
Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014).
pubmed: 24316577
doi: 10.1093/nar/gkt1229
Manolio, T. A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009).
pubmed: 19812666
pmcid: 2831613
doi: 10.1038/nature08494
Ochoa, D. et al. Human genetics evidence supports two-thirds of the 2021 FDA-approved drugs. Nat. Rev. Drug Discov. 21, 551 (2022).
pubmed: 35804044
doi: 10.1038/d41573-022-00120-3
Kurki, M. I. et al. FinnGen provides genetic insights from a well-phenotyped isolated population. Nature 613, 508–518 (2023).
pubmed: 36653562
pmcid: 9849126
doi: 10.1038/s41586-022-05473-8
Shi, S. et al. A Genomics England haplotype reference panel and imputation of UK Biobank. Nat. Genet. https://doi.org/10.1038/s41588-024-01868-7 (2024).
doi: 10.1038/s41588-024-01868-7
pubmed: 39187616
pmcid: 11387195
Ziyatdinov, A. et al. Genotyping, sequencing and analysis of 140,000 adults from Mexico City. Nature 622, 784–793 (2023).
pubmed: 37821707
pmcid: 10600010
doi: 10.1038/s41586-023-06595-3
Edgar, R., Domrachev, M. & Lash, A. E. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30, 207–210 (2002).
pubmed: 11752295
pmcid: 99122
doi: 10.1093/nar/30.1.207
Barrett, T. et al. NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res. 41, D991–D995 (2012).
pubmed: 23193258
pmcid: 3531084
doi: 10.1093/nar/gks1193
GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
doi: 10.1126/science.aaz1776
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
doi: 10.1038/nature11247
Welsh, S. et al. Comparison of DNA quantification methodology used in the DNA extraction protocol for the UK Biobank cohort. BMC Genomics 18, 26 (2017).
pubmed: 28056765
pmcid: 5217214
doi: 10.1186/s12864-016-3391-x
McLaren, W. et al. The Ensembl variant effect predictor. Genome Biol. 17, 122 (2016).
pubmed: 27268795
pmcid: 4893825
doi: 10.1186/s13059-016-0974-4
Liu, X. et al. dbNSFP v4: a comprehensive database of transcript-specific functional predictions and annotations for human nonsynonymous and splice-site SNVs. Genome Med. 12, 103 (2020).
pubmed: 33261662
pmcid: 7709417
doi: 10.1186/s13073-020-00803-9
Kumar, P., Henikoff, S. & Ng, P. C. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat. Protoc. 4, 1073–1081 (2009).
pubmed: 19561590
doi: 10.1038/nprot.2009.86
Adzhubei, I. A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248–249 (2010).
pubmed: 20354512
pmcid: 2855889
doi: 10.1038/nmeth0410-248
Chun, S. & Fay, J. C. Identification of deleterious mutations within three human genomes. Genome Res. 19, 1553–1561 (2009).
pubmed: 19602639
pmcid: 2752137
doi: 10.1101/gr.092619.109
Schwarz, J. M. et al. MutationTaster evaluates disease-causing potential of sequence alterations. Nat. Methods 7, 575–576 (2010).
pubmed: 20676075
doi: 10.1038/nmeth0810-575
Gaynor, S. M. & Joseph, T. rgcgithub/ukb_genetic_association_yield: v1.0. Zenodo https://doi.org/10.5281/zenodo.13357248 (2024).