MMPatho: Leveraging Multilevel Consensus and Evolutionary Information for Enhanced Missense Mutation Pathogenic Prediction.


Journal

Journal of chemical information and modeling
ISSN: 1549-960X
Titre abrégé: J Chem Inf Model
Pays: United States
ID NLM: 101230060

Informations de publication

Date de publication:
27 Nov 2023
Historique:
medline: 28 11 2023
pubmed: 10 11 2023
entrez: 10 11 2023
Statut: ppublish

Résumé

Understanding the pathogenicity of missense mutation (MM) is essential for shed light on genetic diseases, gene functions, and individual variations. In this study, we propose a novel computational approach, called MMPatho, for enhancing missense mutation pathogenic prediction. First, we established a large-scale nonredundant MM benchmark data set based on the entire Ensembl database, complemented by a focused blind test set specifically for pathogenic GOF/LOF MM. Based on this data set, for each mutation, we utilized Ensembl VEP v104 and dbNSFP v4.1a to extract variant-level, amino acid-level, individuals' outputs, and genome-level features. Additionally, protein sequences were generated using ENSP identifiers with the Ensembl API, and then encoded. The mutant sites' ESM-1b and ProtTrans-T5 embeddings were subsequently extracted. Then, our model group (MMPatho) was developed by leveraging upon these efforts, which comprised ConsMM and EvoIndMM. To be specific, ConsMM employs individuals' outputs and XGBoost with SHAP explanation analysis, while EvoIndMM investigates the potential enhancement of predictive capability by incorporating evolutionary information from ESM-1b and ProtT5-XL-U50, large protein language embeddings. Through rigorous comparative experiments, both ConsMM and EvoIndMM were capable of achieving remarkable AUROC (0.9836 and 0.9854) and AUPR (0.9852 and 0.9902) values on the blind test set devoid of overlapping variations and proteins from the training data, thus highlighting the superiority of our computational approach in the prediction of MM pathogenicity. Our Web server, available at http://csbio.njust.edu.cn/bioinf/mmpatho/, allows researchers to predict the pathogenicity (alongside the reliability index score) of MMs using the ConsMM and EvoIndMM models and provides extensive annotations for user input. Additionally, the newly constructed benchmark data set and blind test set can be accessed via the data page of our web server.

Identifiants

pubmed: 37947586
doi: 10.1021/acs.jcim.3c00950
pmc: PMC10685454
doi:

Substances chimiques

Proteins 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

7239-7257

Références

BMC Genomics. 2013;14 Suppl 3:S3
pubmed: 23819870
J Mol Biol. 2019 Jun 14;431(13):2449-2459
pubmed: 30796987
Nature. 2016 Aug 17;536(7616):285-91
pubmed: 27535533
Sci Rep. 2015 May 27;5:10576
pubmed: 26015273
Nat Rev Cancer. 2014 Oct;14(10):662-72
pubmed: 25176334
Genome Res. 2010 Jan;20(1):110-21
pubmed: 19858363
Nat Protoc. 2016 Jan;11(1):1-9
pubmed: 26633127
Proc Natl Acad Sci U S A. 1992 Nov 15;89(22):10915-9
pubmed: 1438297
Gene. 2019 Jan 5;680:20-33
pubmed: 30240882
Bioinformatics. 2009 Oct 15;25(20):2745-6
pubmed: 19717576
Nat Genet. 2016 Feb;48(2):214-20
pubmed: 26727659
Hum Mutat. 2015 May;36(5):513-23
pubmed: 25684150
Nucleic Acids Res. 2018 Jul 2;46(W1):W329-W337
pubmed: 29860432
Nucleic Acids Res. 2003 Jul 1;31(13):3812-4
pubmed: 12824425
Nature. 2010 Oct 28;467(7319):1061-73
pubmed: 20981092
Nat Rev Neurosci. 2016 Apr;17(4):201-7
pubmed: 26911435
Nucleic Acids Res. 2023 Jan 6;51(D1):D523-D531
pubmed: 36408920
Hum Mutat. 2013 Jan;34(1):57-65
pubmed: 23033316
PLoS Comput Biol. 2010 Dec 02;6(12):e1001025
pubmed: 21152010
Nucleic Acids Res. 2001 Jan 1;29(1):308-11
pubmed: 11125122
Genome Res. 2009 Sep;19(9):1553-61
pubmed: 19602639
Genome Biol. 2016 Jun 06;17(1):122
pubmed: 27268795
Nucleic Acids Res. 2016 Jun 20;44(11):e107
pubmed: 27084946
Nature. 2020 May;581(7809):434-443
pubmed: 32461654
Bioinformatics. 2015 Aug 15;31(16):2745-7
pubmed: 25851949
Nat Commun. 2020 Nov 20;11(1):5918
pubmed: 33219223
Hum Mutat. 2008 Nov;29(11):1342-54
pubmed: 18951461
NAR Genom Bioinform. 2020 May 26;2(2):lqaa038
pubmed: 33543123
Genome Med. 2020 Dec 2;12(1):103
pubmed: 33261662
Nat Mach Intell. 2022 Nov;4(11):1017-1028
pubmed: 37484202
Hum Mutat. 2011 Aug;32(8):894-9
pubmed: 21520341
Am J Hum Genet. 2011 Apr 8;88(4):440-9
pubmed: 21457909
Bioinformatics. 2017 Feb 15;33(4):471-474
pubmed: 27563026
Nat Genet. 2018 Aug;50(8):1161-1170
pubmed: 30038395
Nucleic Acids Res. 2020 Jul 2;48(W1):W154-W161
pubmed: 32352516
Nat Methods. 2010 Aug;7(8):575-6
pubmed: 20676075
Am J Hum Genet. 2018 Oct 4;103(4):474-483
pubmed: 30220433
Nucleic Acids Res. 2011 Jan;39(Database issue):D945-50
pubmed: 20952405
Curr Protoc. 2021 May;1(5):e113
pubmed: 33961736
Am J Hum Genet. 2021 Dec 2;108(12):2301-2318
pubmed: 34762822
Endocrinol Metab Clin North Am. 2017 Jun;46(2):503-517
pubmed: 28476234
Hum Mol Genet. 2015 Apr 15;24(8):2125-37
pubmed: 25552646
Hum Mutat. 2017 Mar;38(3):243-251
pubmed: 27995669
Nucleic Acids Res. 2019 Jan 8;47(D1):D886-D894
pubmed: 30371827
IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):7112-7127
pubmed: 34232869
Am J Hum Genet. 2021 Oct 7;108(10):1891-1906
pubmed: 34551312
Bioinformatics. 2018 Feb 1;34(3):511-513
pubmed: 28968714
Bioinformatics. 2015 Mar 1;31(5):761-3
pubmed: 25338716
Am J Hum Genet. 2016 Oct 6;99(4):877-885
pubmed: 27666373
Nucleic Acids Res. 2011 Sep 1;39(17):e118
pubmed: 21727090
Nat Genet. 2014 Mar;46(3):310-5
pubmed: 24487276
PLoS Comput Biol. 2014 Feb 06;10(2):e1003460
pubmed: 24516372
Genome Med. 2022 Oct 8;14(1):115
pubmed: 36209109
Nat Methods. 2010 Apr;7(4):248-9
pubmed: 20354512
Bioinformatics. 2009 Jun 15;25(12):i54-62
pubmed: 19478016
PLoS Comput Biol. 2014 Jan;10(1):e1003440
pubmed: 24453961
Proc Natl Acad Sci U S A. 2021 Apr 13;118(15):
pubmed: 33876751
PLoS Genet. 2015 Sep 02;11(9):e1005492
pubmed: 26332131
Bioinformatics. 2015 May 15;31(10):1536-43
pubmed: 25583119
Nucleic Acids Res. 2014 Jan;42(Database issue):D980-5
pubmed: 24234437
Wiley Interdiscip Rev Syst Biol Med. 2019 May;11(3):e1443
pubmed: 30548534
Nat Commun. 2021 Jan 21;12(1):510
pubmed: 33479230
Nucleic Acids Res. 2017 Jul 3;45(W1):W201-W206
pubmed: 28498993
Nat Genet. 2016 Dec;48(12):1581-1586
pubmed: 27776117
Nat Genet. 2015 Mar;47(3):276-83
pubmed: 25599402

Auteurs

Fang Ge (F)

School of Geographic and Biologic Information, Nanjing University of Posts and Telecommunications, 9 Wenyuanlu, Nanjing 210023, China.
Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.

Muhammad Arif (M)

College of Science and Engineering, Hamad Bin Khalifa University, Doha 34110, Qatar.
Department of Community Medical Technology, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.

Zihao Yan (Z)

School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China.

Hanin Alahmadi (H)

College of Computer Science and Engineering, Taibah University, Madinah 344, Saudi Arabia.

Apilak Worachartcheewan (A)

Department of Community Medical Technology, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.

Dong-Jun Yu (DJ)

School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China.

Watshara Shoombuatong (W)

Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH