MMPatho: Leveraging Multilevel Consensus and Evolutionary Information for Enhanced Missense Mutation Pathogenic Prediction.
Journal
Journal of chemical information and modeling
ISSN: 1549-960X
Titre abrégé: J Chem Inf Model
Pays: United States
ID NLM: 101230060
Informations de publication
Date de publication:
27 Nov 2023
27 Nov 2023
Historique:
medline:
28
11
2023
pubmed:
10
11
2023
entrez:
10
11
2023
Statut:
ppublish
Résumé
Understanding the pathogenicity of missense mutation (MM) is essential for shed light on genetic diseases, gene functions, and individual variations. In this study, we propose a novel computational approach, called MMPatho, for enhancing missense mutation pathogenic prediction. First, we established a large-scale nonredundant MM benchmark data set based on the entire Ensembl database, complemented by a focused blind test set specifically for pathogenic GOF/LOF MM. Based on this data set, for each mutation, we utilized Ensembl VEP v104 and dbNSFP v4.1a to extract variant-level, amino acid-level, individuals' outputs, and genome-level features. Additionally, protein sequences were generated using ENSP identifiers with the Ensembl API, and then encoded. The mutant sites' ESM-1b and ProtTrans-T5 embeddings were subsequently extracted. Then, our model group (MMPatho) was developed by leveraging upon these efforts, which comprised ConsMM and EvoIndMM. To be specific, ConsMM employs individuals' outputs and XGBoost with SHAP explanation analysis, while EvoIndMM investigates the potential enhancement of predictive capability by incorporating evolutionary information from ESM-1b and ProtT5-XL-U50, large protein language embeddings. Through rigorous comparative experiments, both ConsMM and EvoIndMM were capable of achieving remarkable AUROC (0.9836 and 0.9854) and AUPR (0.9852 and 0.9902) values on the blind test set devoid of overlapping variations and proteins from the training data, thus highlighting the superiority of our computational approach in the prediction of MM pathogenicity. Our Web server, available at http://csbio.njust.edu.cn/bioinf/mmpatho/, allows researchers to predict the pathogenicity (alongside the reliability index score) of MMs using the ConsMM and EvoIndMM models and provides extensive annotations for user input. Additionally, the newly constructed benchmark data set and blind test set can be accessed via the data page of our web server.
Identifiants
pubmed: 37947586
doi: 10.1021/acs.jcim.3c00950
pmc: PMC10685454
doi:
Substances chimiques
Proteins
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
7239-7257Références
BMC Genomics. 2013;14 Suppl 3:S3
pubmed: 23819870
J Mol Biol. 2019 Jun 14;431(13):2449-2459
pubmed: 30796987
Nature. 2016 Aug 17;536(7616):285-91
pubmed: 27535533
Sci Rep. 2015 May 27;5:10576
pubmed: 26015273
Nat Rev Cancer. 2014 Oct;14(10):662-72
pubmed: 25176334
Genome Res. 2010 Jan;20(1):110-21
pubmed: 19858363
Nat Protoc. 2016 Jan;11(1):1-9
pubmed: 26633127
Proc Natl Acad Sci U S A. 1992 Nov 15;89(22):10915-9
pubmed: 1438297
Gene. 2019 Jan 5;680:20-33
pubmed: 30240882
Bioinformatics. 2009 Oct 15;25(20):2745-6
pubmed: 19717576
Nat Genet. 2016 Feb;48(2):214-20
pubmed: 26727659
Hum Mutat. 2015 May;36(5):513-23
pubmed: 25684150
Nucleic Acids Res. 2018 Jul 2;46(W1):W329-W337
pubmed: 29860432
Nucleic Acids Res. 2003 Jul 1;31(13):3812-4
pubmed: 12824425
Nature. 2010 Oct 28;467(7319):1061-73
pubmed: 20981092
Nat Rev Neurosci. 2016 Apr;17(4):201-7
pubmed: 26911435
Nucleic Acids Res. 2023 Jan 6;51(D1):D523-D531
pubmed: 36408920
Hum Mutat. 2013 Jan;34(1):57-65
pubmed: 23033316
PLoS Comput Biol. 2010 Dec 02;6(12):e1001025
pubmed: 21152010
Nucleic Acids Res. 2001 Jan 1;29(1):308-11
pubmed: 11125122
Genome Res. 2009 Sep;19(9):1553-61
pubmed: 19602639
Genome Biol. 2016 Jun 06;17(1):122
pubmed: 27268795
Nucleic Acids Res. 2016 Jun 20;44(11):e107
pubmed: 27084946
Nature. 2020 May;581(7809):434-443
pubmed: 32461654
Bioinformatics. 2015 Aug 15;31(16):2745-7
pubmed: 25851949
Nat Commun. 2020 Nov 20;11(1):5918
pubmed: 33219223
Hum Mutat. 2008 Nov;29(11):1342-54
pubmed: 18951461
NAR Genom Bioinform. 2020 May 26;2(2):lqaa038
pubmed: 33543123
Genome Med. 2020 Dec 2;12(1):103
pubmed: 33261662
Nat Mach Intell. 2022 Nov;4(11):1017-1028
pubmed: 37484202
Hum Mutat. 2011 Aug;32(8):894-9
pubmed: 21520341
Am J Hum Genet. 2011 Apr 8;88(4):440-9
pubmed: 21457909
Bioinformatics. 2017 Feb 15;33(4):471-474
pubmed: 27563026
Nat Genet. 2018 Aug;50(8):1161-1170
pubmed: 30038395
Nucleic Acids Res. 2020 Jul 2;48(W1):W154-W161
pubmed: 32352516
Nat Methods. 2010 Aug;7(8):575-6
pubmed: 20676075
Am J Hum Genet. 2018 Oct 4;103(4):474-483
pubmed: 30220433
Nucleic Acids Res. 2011 Jan;39(Database issue):D945-50
pubmed: 20952405
Curr Protoc. 2021 May;1(5):e113
pubmed: 33961736
Am J Hum Genet. 2021 Dec 2;108(12):2301-2318
pubmed: 34762822
Endocrinol Metab Clin North Am. 2017 Jun;46(2):503-517
pubmed: 28476234
Hum Mol Genet. 2015 Apr 15;24(8):2125-37
pubmed: 25552646
Hum Mutat. 2017 Mar;38(3):243-251
pubmed: 27995669
Nucleic Acids Res. 2019 Jan 8;47(D1):D886-D894
pubmed: 30371827
IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):7112-7127
pubmed: 34232869
Am J Hum Genet. 2021 Oct 7;108(10):1891-1906
pubmed: 34551312
Bioinformatics. 2018 Feb 1;34(3):511-513
pubmed: 28968714
Bioinformatics. 2015 Mar 1;31(5):761-3
pubmed: 25338716
Am J Hum Genet. 2016 Oct 6;99(4):877-885
pubmed: 27666373
Nucleic Acids Res. 2011 Sep 1;39(17):e118
pubmed: 21727090
Nat Genet. 2014 Mar;46(3):310-5
pubmed: 24487276
PLoS Comput Biol. 2014 Feb 06;10(2):e1003460
pubmed: 24516372
Genome Med. 2022 Oct 8;14(1):115
pubmed: 36209109
Nat Methods. 2010 Apr;7(4):248-9
pubmed: 20354512
Bioinformatics. 2009 Jun 15;25(12):i54-62
pubmed: 19478016
PLoS Comput Biol. 2014 Jan;10(1):e1003440
pubmed: 24453961
Proc Natl Acad Sci U S A. 2021 Apr 13;118(15):
pubmed: 33876751
PLoS Genet. 2015 Sep 02;11(9):e1005492
pubmed: 26332131
Bioinformatics. 2015 May 15;31(10):1536-43
pubmed: 25583119
Nucleic Acids Res. 2014 Jan;42(Database issue):D980-5
pubmed: 24234437
Wiley Interdiscip Rev Syst Biol Med. 2019 May;11(3):e1443
pubmed: 30548534
Nat Commun. 2021 Jan 21;12(1):510
pubmed: 33479230
Nucleic Acids Res. 2017 Jul 3;45(W1):W201-W206
pubmed: 28498993
Nat Genet. 2016 Dec;48(12):1581-1586
pubmed: 27776117
Nat Genet. 2015 Mar;47(3):276-83
pubmed: 25599402