SNAREs-SAP: SNARE Proteins Identification With PSSM Profiles.

SNARE proteins SVM-RFE-CBR machine learning position-specific scoring matrix support vector machine

Journal

Frontiers in genetics
ISSN: 1664-8021
Titre abrégé: Front Genet
Pays: Switzerland
ID NLM: 101560621

Informations de publication

Date de publication:
2021
Historique:
received: 04 11 2021
accepted: 15 11 2021
entrez: 6 1 2022
pubmed: 7 1 2022
medline: 7 1 2022
Statut: epublish

Résumé

Soluble N-ethylmaleimide sensitive factor activating protein receptor (SNARE) proteins are a large family of transmembrane proteins located in organelles and vesicles. The important roles of SNARE proteins include initiating the vesicle fusion process and activating and fusing proteins as they undergo exocytosis activity, and SNARE proteins are also vital for the transport regulation of membrane proteins and non-regulatory vesicles. Therefore, there is great significance in establishing a method to efficiently identify SNARE proteins. However, the identification accuracy of the existing methods such as SNARE CNN is not satisfied. In our study, we developed a method based on a support vector machine (SVM) that can effectively recognize SNARE proteins. We used the position-specific scoring matrix (PSSM) method to extract features of SNARE protein sequences, used the support vector machine recursive elimination correlation bias reduction (SVM-RFE-CBR) algorithm to rank the importance of features, and then screened out the optimal subset of feature data based on the sorted results. We input the feature data into the model when building the model, used 10-fold crossing validation for training, and tested model performance by using an independent dataset. In independent tests, the ability of our method to identify SNARE proteins achieved a sensitivity of 68%, specificity of 94%, accuracy of 92%, area under the curve (AUC) of 84%, and Matthew's correlation coefficient (MCC) of 0.48. The results of the experiment show that the common evaluation indicators of our method are excellent, indicating that our method performs better than other existing classification methods in identifying SNARE proteins.

Identifiants

pubmed: 34987554
doi: 10.3389/fgene.2021.809001
pii: 809001
pmc: PMC8721734
doi:

Types de publication

Journal Article

Langues

eng

Pagination

809001

Informations de copyright

Copyright © 2021 Zhang, Gong, Gao, Li, Gao, Zhao and Dong.

Déclaration de conflit d'intérêts

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Références

Brief Bioinform. 2020 Mar 23;21(2):687-698
pubmed: 30860571
Mol Cell. 1998 Nov;2(5):539-48
pubmed: 9844627
Brief Bioinform. 2021 Sep 2;22(5):
pubmed: 33693454
Brief Bioinform. 2021 May 20;22(3):
pubmed: 32778871
J Mol Biol. 2020 May 15;432(11):3411-3421
pubmed: 32044343
Brief Bioinform. 2021 May 20;22(3):
pubmed: 32524143
Brief Bioinform. 2021 May 20;22(3):
pubmed: 32685972
Brief Bioinform. 2020 May 21;21(3):1058-1068
pubmed: 31157371
Brief Bioinform. 2021 May 20;22(3):
pubmed: 32987405
Nucleic Acids Res. 2021 Jan 8;49(D1):D1233-D1243
pubmed: 33045737
Bioinformatics. 2001 Aug;17(8):721-8
pubmed: 11524373
Bioinformatics. 2017 Jan 1;33(1):122-124
pubmed: 27565583
Nature. 1993 Mar 25;362(6418):353-5
pubmed: 8455721
Genomics. 2020 Nov;112(6):4666-4674
pubmed: 32818637
Front Cell Dev Biol. 2020 Oct 29;8:591487
pubmed: 33195258
Bioinformatics. 2021 Mar 15;:
pubmed: 33720331
Brief Bioinform. 2020 Jul 15;21(4):1437-1447
pubmed: 31504150
Biochem Biophys Res Commun. 2000 Nov 19;278(2):477-83
pubmed: 11097861
Comput Math Methods Med. 2021 Jan 7;2021:6664362
pubmed: 33505515
Brief Bioinform. 2019 Sep 27;20(5):1826-1835
pubmed: 29947743
Cell Mol Life Sci. 2021 Jan;78(1):129-141
pubmed: 32642789
Bioinformatics. 2006 Jul 15;22(14):1717-22
pubmed: 16672258
PLoS Comput Biol. 2021 Feb 9;17(2):e1008696
pubmed: 33561121
ACS Chem Neurosci. 2018 May 16;9(5):1128-1140
pubmed: 29300091
Brief Bioinform. 2021 Mar 22;22(2):1902-1917
pubmed: 32363401
Int J Mol Sci. 2020 Jul 16;21(14):
pubmed: 32708644
Database (Oxford). 2019 Jan 1;2019:
pubmed: 31802128
Brief Bioinform. 2021 Sep 2;22(5):
pubmed: 33443536
Biochimie. 2010 Oct;92(10):1330-4
pubmed: 20600567
Brief Bioinform. 2018 Oct 31;:
pubmed: 30383239
BMC Bioinformatics. 2020 Feb 5;21(1):43
pubmed: 32024464
Front Plant Sci. 2021 Mar 01;12:506681
pubmed: 33732270
Brief Bioinform. 2020 Sep 25;21(5):1628-1640
pubmed: 31697319
Bioinformatics. 2020 May 1;36(10):3028-3034
pubmed: 32105326
Curr Gene Ther. 2019;19(2):100-109
pubmed: 31223085
Int Rev Cytol. 2001;207:71-112
pubmed: 11352269
Bioinformatics. 2018 Jun 1;34(11):1953-1956
pubmed: 29365045
Nucleic Acids Res. 2020 Jan 8;48(D1):D1031-D1041
pubmed: 31691823
Brief Bioinform. 2021 May 20;22(3):
pubmed: 32892224
Nucleic Acids Res. 2019 Nov 18;47(20):e127
pubmed: 31504851
Brief Bioinform. 2020 Mar 23;21(2):621-636
pubmed: 30649171
Bioinformatics. 2021 Jan 29;36(21):5177-5186
pubmed: 32702119
Curr Gene Ther. 2020;20(1):1
pubmed: 32603274
PeerJ Comput Sci. 2019 Feb 25;5:e177
pubmed: 33816830
Mol Ther Nucleic Acids. 2019 Dec 6;18:590-604
pubmed: 31678735
Nucleic Acids Res. 2020 Jul 2;48(W1):W436-W448
pubmed: 32324219
Proteins. 2008 Apr;71(1):189-94
pubmed: 17932917
Bioinformatics. 2020 Aug 15;36(16):4466-4472
pubmed: 32467970
IEEE J Biomed Health Inform. 2020 Oct;24(10):3012-3019
pubmed: 32142462
Brief Bioinform. 2020 Sep 25;21(5):1825-1836
pubmed: 31860715
Int J Data Min Bioinform. 2013;8(3):282-93
pubmed: 24417022
Curr Opin Neurobiol. 1997 Jun;7(3):310-5
pubmed: 9232812
Front Bioeng Biotechnol. 2020 Oct 22;8:584807
pubmed: 33195148
Brief Bioinform. 2020 Sep 25;21(5):1733-1741
pubmed: 31665221
Comput Math Methods Med. 2020 Oct 19;2020:8926750
pubmed: 33133228
Biochem Biophys Res Commun. 2007 Aug 24;360(2):339-45
pubmed: 17586467
Nucleic Acids Res. 2017 Jul 3;45(W1):W162-W170
pubmed: 28525573
Bioinformatics. 2021 May 23;37(8):1060-1067
pubmed: 33119044
Annu Rev Cell Dev Biol. 2003;19:493-517
pubmed: 14570579
Brief Bioinform. 2020 Dec 1;21(6):2185-2193
pubmed: 31813954
Mol Ther Nucleic Acids. 2018 Sep 7;12:635-644
pubmed: 30081234
J Theor Biol. 2019 Feb 7;462:230-239
pubmed: 30452958
Nat Struct Biol. 2003 Jun;10(6):440-7
pubmed: 12740606
Proc Natl Acad Sci U S A. 1998 Dec 22;95(26):15781-6
pubmed: 9861047
Nucleic Acids Res. 2021 May 7;49(8):e46
pubmed: 33503258
IEEE/ACM Trans Comput Biol Bioinform. 2021 Sep-Oct;18(5):1831-1840
pubmed: 31985437
Nucleic Acids Res. 2020 Jan 8;48(D1):D1042-D1050
pubmed: 31495872
Mol Cell Proteomics. 2019 Aug;18(8):1683-1699
pubmed: 31097671
Nat Rev Mol Cell Biol. 2001 Feb;2(2):98-106
pubmed: 11252968
Brief Bioinform. 2019 Jan 18;20(1):203-209
pubmed: 28968812

Auteurs

Zixiao Zhang (Z)

College of Information and Computer Engineering, Northeast Forestry University, Harbin, China.

Yue Gong (Y)

College of Information and Computer Engineering, Northeast Forestry University, Harbin, China.

Bo Gao (B)

Department of Radiology, The Second Affiliated Hospital, Harbin Medical University, Harbin, China.

Hongfei Li (H)

College of Information and Computer Engineering, Northeast Forestry University, Harbin, China.

Wentao Gao (W)

College of Information and Computer Engineering, Northeast Forestry University, Harbin, China.

Yuming Zhao (Y)

College of Information and Computer Engineering, Northeast Forestry University, Harbin, China.

Benzhi Dong (B)

College of Information and Computer Engineering, Northeast Forestry University, Harbin, China.

Classifications MeSH