A deep catalog of protein-coding variation in 985,830 individuals.
Journal
bioRxiv : the preprint server for biology
Titre abrégé: bioRxiv
Pays: United States
ID NLM: 101680187
Informations de publication
Date de publication:
02 Nov 2023
02 Nov 2023
Historique:
pubmed:
22
5
2023
medline:
22
5
2023
entrez:
22
5
2023
Statut:
epublish
Résumé
Coding variants that have significant impact on function can provide insights into the biology of a gene but are typically rare in the population. Identifying and ascertaining the frequency of such rare variants requires very large sample sizes. Here, we present the largest catalog of human protein-coding variation to date, derived from exome sequencing of 985,830 individuals of diverse ancestry to serve as a rich resource for studying rare coding variants. Individuals of African, Admixed American, East Asian, Middle Eastern, and South Asian ancestry account for 20% of this Exome dataset. Our catalog of variants includes approximately 10.5 million missense (54% novel) and 1.1 million predicted loss-of-function (pLOF) variants (65% novel, 53% observed only once). We identified individuals with rare homozygous pLOF variants in 4,874 genes, and for 1,838 of these this work is the first to document at least one pLOF homozygote. Additional insights from the RGC-ME dataset include 1) improved estimates of selection against heterozygous loss-of-function and identification of 3,459 genes intolerant to loss-of-function, 83 of which were previously assessed as tolerant to loss-of-function and 1,241 that lack disease annotations; 2) identification of regions depleted of missense variation in 457 genes that are tolerant to loss-of-function; 3) functional interpretation for 10,708 variants of unknown or conflicting significance reported in ClinVar as cryptic splice sites using splicing score thresholds based on empirical variant deleteriousness scores derived from RGC-ME; and 4) an observation that approximately 3% of sequenced individuals carry a clinically actionable genetic variant in the ACMG SF 3.1 list of genes. We make this important resource of coding variation available to the public through a variant allele frequency browser. We anticipate that this report and the RGC-ME dataset will serve as a valuable reference for understanding rare coding variation and help advance precision medicine efforts.
Identifiants
pubmed: 37214792
doi: 10.1101/2023.05.09.539329
pmc: PMC10197621
pii:
doi:
Types de publication
Preprint
Langues
eng
Subventions
Organisme : NCI NIH HHS
ID : R01 CA157823
Pays : United States
Organisme : Intramural NIH HHS
ID : ZIA MH002843
Pays : United States
Références
Cell. 2019 Jan 24;176(3):535-548.e24
pubmed: 30661751
Nature. 2016 Sep 22;537(7621):508-514
pubmed: 27626380
Proc Natl Acad Sci U S A. 2010 Jun 22;107(25):11459-64
pubmed: 20534544
Am J Hum Genet. 2021 Apr 1;108(4):535-548
pubmed: 33798442
Genet Med. 2017 Oct;19(10):1151-1158
pubmed: 28518168
Am J Med Genet B Neuropsychiatr Genet. 2018 Mar;177(2):113-125
pubmed: 28349588
Genome Biol. 2018 Jun 1;19(1):71
pubmed: 29859120
Bioinformatics. 2009 Jul 15;25(14):1754-60
pubmed: 19451168
Science. 2021 Jul 2;373(6550):
pubmed: 34210852
Nucleic Acids Res. 2019 Jan 8;47(D1):D941-D947
pubmed: 30371878
Eur J Hum Genet. 2020 Mar;28(3):378-382
pubmed: 31558841
Commun Biol. 2022 Oct 3;5(1):1051
pubmed: 36192519
Nature. 2014 Sep 18;513(7518):409-13
pubmed: 25230663
Nat Rev Genet. 2018 Jul;19(7):419-430
pubmed: 29743650
Cell. 2022 Sep 1;185(18):3426-3440.e19
pubmed: 36055201
PLoS Genet. 2017 Jun 22;13(6):e1006328
pubmed: 28640878
Sci Adv. 2022 Nov 18;8(46):eadd5430
pubmed: 36383675
Nat Genet. 2014 Sep;46(9):944-50
pubmed: 25086666
Nat Genet. 2019 May;51(5):772-776
pubmed: 30962618
Genome Biol. 2019 Mar 1;20(1):48
pubmed: 30823901
Nature. 2022 Nov;611(7935):312-319
pubmed: 36261521
PLoS Comput Biol. 2010 Dec 02;6(12):e1001025
pubmed: 21152010
Genetics. 1931 Mar;16(2):97-159
pubmed: 17246615
Nature. 2015 Mar 19;519(7543):309-314
pubmed: 25788095
Am J Hum Genet. 1962 Dec;14:353-62
pubmed: 13937884
N Engl J Med. 2017 Jul 20;377(3):211-221
pubmed: 28538136
Pharmacol Ther. 2013 Apr;138(1):103-41
pubmed: 23333322
Proc Natl Acad Sci U S A. 2016 Jan 26;113(4):E440-9
pubmed: 26712023
Genome Res. 2009 May;19(5):826-37
pubmed: 19307593
Genome Biol. 2016 Jun 06;17(1):122
pubmed: 27268795
Am J Cardiol. 2017 Dec 15;120(12):2170-2175
pubmed: 29050682
Nature. 2019 Dec;576(7785):106-111
pubmed: 31802016
Nature. 2020 May;581(7809):434-443
pubmed: 32461654
Am J Hum Genet. 2014 Nov 6;95(5):553-64
pubmed: 25439724
JAMA. 2019 Oct 1;322(13):1305-1306
pubmed: 31469401
Nat Rev Genet. 2018 Jan;19(1):51-62
pubmed: 29082913
Gigascience. 2015 Feb 25;4:7
pubmed: 25722852
Glob Health Epidemiol Genom. 2017 Nov 27;2:e17
pubmed: 29868223
Science. 2015 Nov 27;350(6264):1092-6
pubmed: 26472760
Science. 2015 Nov 27;350(6264):1096-101
pubmed: 26472758
Genet Med. 2022 Apr;24(4):784-797
pubmed: 35148959
Nature. 2021 Nov;599(7886):628-634
pubmed: 34662886
Nat Genet. 2017 Jun;49(6):848-855
pubmed: 28416821
Genome Res. 2009 May;19(5):711-22
pubmed: 19411596
Genome Med. 2020 Dec 2;12(1):103
pubmed: 33261662
Nature. 2022 Jul;607(7920):732-740
pubmed: 35859178
Genome Med. 2019 Dec 31;12(1):2
pubmed: 31892343
Science. 2020 Mar 20;367(6484):
pubmed: 32193295
Mol Biol Evol. 2019 Aug 1;36(8):1701-1710
pubmed: 31004148
Nature. 2016 Aug 17;536(7616):285-91
pubmed: 27535533
Bioinformatics. 2021 Apr 5;36(24):5582-5589
pubmed: 33399819
Comput Struct Biotechnol J. 2019 Dec 26;18:189-198
pubmed: 31988705
N Engl J Med. 2022 Jul 28;387(4):332-344
pubmed: 35939579
Science. 2016 Apr 22;352(6284):474-7
pubmed: 26940866
Genetics. 2003 Dec;165(4):2213-33
pubmed: 14704198
Mol Cell. 2019 Jan 3;73(1):183-194.e8
pubmed: 30503770
Curr Biol. 2015 Oct 5;25(19):2518-26
pubmed: 26387712
Cell Res. 2012 Jan;22(1):90-106
pubmed: 21876555
Nature. 2022 Apr;604(7905):310-315
pubmed: 35388217
Nature. 2018 Mar 29;555(7698):611-616
pubmed: 29562236
Nucleic Acids Res. 2013 Jan;41(Database issue):D110-7
pubmed: 23161672
Nat Med. 2017 Apr 7;23(4):405-408
pubmed: 28388612
Eur J Clin Pharmacol. 2009 Mar;65(3):281-5
pubmed: 18982321
Ann Hum Genet. 2004 Mar;68(Pt 2):93-109
pubmed: 15008789
Nature. 2022 Mar;603(7903):858-863
pubmed: 35322230
Cardiol Ther. 2020 Jun;9(1):59-73
pubmed: 32026310
Nature. 2014 Feb 6;506(7486):97-101
pubmed: 24390345
Nature. 2023 Oct;622(7984):784-793
pubmed: 37821707
Nature. 2018 Oct;562(7726):203-209
pubmed: 30305743
Nat Genet. 2015 May;47(5):448-52
pubmed: 25807282
Cell. 2015 Dec 3;163(6):1515-26
pubmed: 26627737
Am J Hum Genet. 2008 Sep;83(3):347-58
pubmed: 18760391
J Infect Dis. 2005 Jul 1;192(1):178-86
pubmed: 15942909
Am J Hum Genet. 2011 May 13;88(5):650-6
pubmed: 21549337
Sci Adv. 2021 Sep 03;7(36):eabi6856
pubmed: 34516913
Genet Med. 2018 Aug;20(8):867-871
pubmed: 29144512
N Engl J Med. 2010 Dec 2;363(23):2220-7
pubmed: 20942659
Drug Metab Rev. 2021 May;53(2):253-278
pubmed: 33820459
Nat Genet. 2017 May;49(5):806-810
pubmed: 28369035
Nat Genet. 2019 Jan;51(1):88-95
pubmed: 30531870
Hum Mutat. 2022 Aug;43(8):1012-1030
pubmed: 34859531
Nature. 2021 Sep;597(7877):527-532
pubmed: 34375979
Nucleic Acids Res. 2022 Jan 7;50(D1):D988-D995
pubmed: 34791404
Nature. 2020 Oct;586(7831):749-756
pubmed: 33087929
Nature. 2017 Apr 12;544(7649):235-239
pubmed: 28406212
Elife. 2020 Mar 24;9:
pubmed: 32207686
PLoS Genet. 2015 Aug 28;11(8):e1005436
pubmed: 26317225
Genome Res. 2017 Oct;27(10):1715-1729
pubmed: 28864458
Nucleic Acids Res. 2019 Jul 2;47(W1):W121-W126
pubmed: 31170280
Nat Med. 2021 Jan;27(1):66-72
pubmed: 33432171
Am J Med Genet A. 2021 Nov;185(11):3476-3484
pubmed: 34467620
Nucleic Acids Res. 2018 Jan 4;46(D1):D1062-D1067
pubmed: 29165669
Nature. 2021 Feb;590(7845):290-299
pubmed: 33568819
Cell. 2017 Aug 24;170(5):956-972.e23
pubmed: 28841419
Am Fam Physician. 2007 Aug 1;76(3):391-6
pubmed: 17708140
Genet Med. 2022 Jul;24(7):1407-1414
pubmed: 35802134
J Hum Genet. 2021 Jan;66(1):11-23
pubmed: 32948841