Machine-learning of complex evolutionary signals improves classification of SNVs.
Journal
NAR genomics and bioinformatics
ISSN: 2631-9268
Titre abrégé: NAR Genom Bioinform
Pays: England
ID NLM: 101756213
Informations de publication
Date de publication:
Jun 2022
Jun 2022
Historique:
received:
24
10
2021
revised:
08
02
2022
accepted:
28
03
2022
entrez:
11
4
2022
pubmed:
12
4
2022
medline:
12
4
2022
Statut:
epublish
Résumé
Conservation is a strong predictor for the pathogenicity of single-nucleotide variants (SNVs). However, some positions that present complex conservation patterns across vertebrates stray from this paradigm. Here, we analyzed the association between complex conservation patterns and the pathogenicity of SNVs in the 115 disease-genes that had sufficient variant data. We show that conservation is not a one-rule-fits-all solution since its accuracy highly depends on the analyzed set of species and genes. For example, pairwise comparisons between the human and 99 vertebrate species showed that species differ in their ability to predict the clinical outcomes of variants among different genes using conservation. Furthermore, certain genes were less amenable for conservation-based variant prediction, while others demonstrated species that optimize prediction. These insights led to developing EvoDiagnostics, which uses the conservation against each species as a feature within a random-forest machine-learning classification algorithm. EvoDiagnostics outperformed traditional conservation algorithms, deep-learning based methods and most ensemble tools in every prediction-task, highlighting the strength of optimizing conservation analysis per-species and per-gene. Overall, we suggest a new and a more biologically relevant approach for analyzing conservation, which improves prediction of variant pathogenicity.
Identifiants
pubmed: 35402908
doi: 10.1093/nargab/lqac025
pii: lqac025
pmc: PMC8988715
doi:
Types de publication
Journal Article
Langues
eng
Pagination
lqac025Informations de copyright
© The Author(s) 2022. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.
Références
Nat Commun. 2021 Nov 9;12(1):6454
pubmed: 34753957
N Engl J Med. 2002 May 23;346(21):1616-22
pubmed: 12023993
Genome Res. 2009 Sep;19(9):1553-61
pubmed: 19602639
Bioinformatics. 2019 Feb 1;35(3):526-528
pubmed: 30016406
Nucleic Acids Res. 2004 Jan 1;32(Database issue):D493-6
pubmed: 14681465
Nat Methods. 2010 Aug;7(8):575-6
pubmed: 20676075
Nucleic Acids Res. 2015 Jul 1;43(W1):W154-9
pubmed: 25958392
Nature. 2013 Jan 31;493(7434):694-8
pubmed: 23364702
Mol Biol Evol. 2001 May;18(5):866-73
pubmed: 11319270
Nat Rev Mol Cell Biol. 2010 Mar;11(3):196-207
pubmed: 20177395
J Clin Oncol. 2004 Mar 15;22(6):1055-62
pubmed: 14981104
Gene. 2019 Jan 5;680:20-33
pubmed: 30240882
Nat Genet. 2016 Apr;48(4):427-37
pubmed: 26950095
PLoS Comput Biol. 2013;9(8):e1003118
pubmed: 23950696
Nat Genet. 2016 Feb;48(2):214-20
pubmed: 26727659
Nat Commun. 2019 Apr 5;10(1):1556
pubmed: 30952844
Curr Opin Struct Biol. 2018 Jun;50:26-32
pubmed: 29101847
Am J Hum Genet. 2016 May 5;98(5):801-817
pubmed: 27153395
J Natl Cancer Inst. 1999 Sep 1;91(17):1475-9
pubmed: 10469748
PLoS Comput Biol. 2010 Dec 02;6(12):e1001025
pubmed: 21152010
J Clin Oncol. 2014 Jul 1;32(19):2001-9
pubmed: 24733792
Proc Natl Acad Sci U S A. 2010 Feb 23;107(8):3622-7
pubmed: 20139301
Hum Mutat. 2013 Jan;34(1):57-65
pubmed: 23033316
Nucleic Acids Res. 2018 Jul 2;46(W1):W537-W544
pubmed: 29790989
Nature. 2021 Nov;599(7883):91-95
pubmed: 34707284
Nat Methods. 2014 Mar;11(3):294-6
pubmed: 24487584
Mol Cell. 2001 Feb;7(2):263-72
pubmed: 11239455
Genome Res. 2005 Aug;15(8):1034-50
pubmed: 16024819
Genome Res. 2004 Apr;14(4):708-15
pubmed: 15060014
Bioinformatics. 2009 Jul 15;25(14):1841-2
pubmed: 19468054
Math Biosci. 1998 Jan 1;147(1):63-91
pubmed: 9401352
Genome Res. 2019 Mar;29(3):439-448
pubmed: 30718334
Front Genet. 2019 Oct 07;10:914
pubmed: 31649718
J Genet Couns. 2017 Aug;26(4):866-877
pubmed: 28127677
Nat Genet. 2018 Aug;50(8):1161-1170
pubmed: 30038395
Elife. 2021 Aug 06;10:
pubmed: 34355696
Bioinformatics. 2016 Sep 15;32(18):2847-9
pubmed: 27207943
Genet Med. 2015 May;17(5):405-24
pubmed: 25741868
BMC Bioinformatics. 2011 Mar 17;12:77
pubmed: 21414208
Cell. 2012 Jun 22;149(7):1607-21
pubmed: 22579045
Nat Rev Genet. 2020 Oct;21(10):581-596
pubmed: 32839576
Mol Biosyst. 2016 May 24;12(6):1818-30
pubmed: 27066891
Mol Cell. 1999 Oct;4(4):511-8
pubmed: 10549283
Genome Res. 2002 Jun;12(6):996-1006
pubmed: 12045153
Eur J Med Genet. 2017 Oct;60(10):553-558
pubmed: 28774848
Cell. 2014 Jul 3;158(1):213-25
pubmed: 24995987
Nucleic Acids Res. 2019 Jan 8;47(D1):D886-D894
pubmed: 30371827
Bioinformatics. 2020 Aug 15;36(14):4116-4125
pubmed: 32353123
Bioinformatics. 2015 Mar 1;31(5):761-3
pubmed: 25338716
Am J Hum Genet. 2016 Oct 6;99(4):877-885
pubmed: 27666373
Nat Methods. 2015 Oct;12(10):931-4
pubmed: 26301843
Nat Genet. 2014 Mar;46(3):310-5
pubmed: 24487276
Nat Methods. 2010 Apr;7(4):248-9
pubmed: 20354512
Nat Biotechnol. 2012 Nov;30(11):1072-80
pubmed: 23138306
Mol Cell. 1998 Feb;1(3):347-57
pubmed: 9660919
Nucleic Acids Res. 2015 Jul 1;43(W1):W589-98
pubmed: 25897122
Commun Biol. 2019 Jul 2;2:248
pubmed: 31286065
iScience. 2020 Aug 21;23(8):101384
pubmed: 32738617
Mol Syst Biol. 2013 Oct 01;9:692
pubmed: 24084807
Nucleic Acids Res. 2018 Sep 6;46(15):7793-7804
pubmed: 30060008
Proc Natl Acad Sci U S A. 1999 Apr 13;96(8):4285-8
pubmed: 10200254
Nat Genet. 2017 Apr;49(4):618-624
pubmed: 28288115
NAR Genom Bioinform. 2021 Apr 20;3(2):lqab024
pubmed: 33928243
Nucleic Acids Res. 2021 Jan 8;49(D1):D1046-D1057
pubmed: 33221922
Am J Hum Genet. 2003 May;72(5):1117-30
pubmed: 12677558
Cell. 2016 May 5;165(4):963-75
pubmed: 27087444
Curr Protoc Hum Genet. 2016 Apr 01;89:8.16.1-8.16.23
pubmed: 27037489
Nucleic Acids Res. 2018 Jan 4;46(D1):D1062-D1067
pubmed: 29165669
Genome Res. 2010 Jan;20(1):110-21
pubmed: 19858363
PLoS Comput Biol. 2019 Oct 21;15(10):e1006891
pubmed: 31634362
Nat Genet. 2016 Dec;48(12):1581-1586
pubmed: 27776117
Nat Protoc. 2009;4(7):1073-81
pubmed: 19561590