Alignment-free similarity analysis for protein sequences based on fuzzy integral.
Journal
Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288
Informations de publication
Date de publication:
26 02 2019
26 02 2019
Historique:
received:
12
03
2018
accepted:
15
01
2019
entrez:
28
2
2019
pubmed:
28
2
2019
medline:
17
9
2020
Statut:
epublish
Résumé
Sequence comparison is an essential part of modern molecular biology research. In this study, we estimated the parameters of Markov chain by considering the frequencies of occurrence of the all possible amino acid pairs from each alignment-free protein sequence. These estimated Markov chain parameters were used to calculate similarity between two protein sequences based on a fuzzy integral algorithm. For validation, our result was compared with both alignment-based (ClustalW) and alignment-free methods on six benchmark datasets. The results indicate that our developed algorithm has a better clustering performance for protein sequence comparison.
Identifiants
pubmed: 30808983
doi: 10.1038/s41598-019-39477-8
pii: 10.1038/s41598-019-39477-8
pmc: PMC6391537
doi:
Substances chimiques
Mitochondrial Proteins
0
Proteins
0
MT-ND5 protein, human
EC 1.6.99.3
MT-ND6 protein, human
EC 1.6.99.3
NADH Dehydrogenase
EC 1.6.99.3
Electron Transport Complex I
EC 7.1.1.2
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
2775Références
BMC Bioinformatics. 2009 Jul 20;10:224
pubmed: 19615102
J Biol Chem. 1983 Jan 25;258(2):1318-27
pubmed: 6822501
Genomics. 2016 Jan;107(1):16-23
pubmed: 26705741
J Virol. 2004 Aug;78(15):7863-6
pubmed: 15254158
FEBS Lett. 2006 Oct 2;580(22):5321-7
pubmed: 16979630
Brief Bioinform. 2008 May;9(3):198-209
pubmed: 18192302
J Mol Biol. 2008 Aug 29;381(2):487-507
pubmed: 18599072
Biochim Biophys Acta. 2011 Nov;1807(11):1390-7
pubmed: 21749854
Brief Bioinform. 2014 May;15(3):343-53
pubmed: 24064230
Mol Biol Evol. 2016 Jul;33(7):1870-4
pubmed: 27004904
Proteins. 2008 Dec;73(4):864-71
pubmed: 18536018
DNA Cell Biol. 2008 May;27(5):241-50
pubmed: 18348704
BMB Rep. 2008 Mar 31;41(3):217-22
pubmed: 18377725
Comput Biol Med. 2015 Aug;63:287-92
pubmed: 25791335
N Engl J Med. 2003 May 15;348(20):1953-66
pubmed: 12690092
Gene. 2013 Oct 25;529(2):250-6
pubmed: 23939466
J Comput Biol. 1994 Fall;1(3):199-215
pubmed: 8790465
Genome Biol. 2017 Oct 3;18(1):186
pubmed: 28974235
Bioinformatics. 2003 Mar 1;19(4):513-23
pubmed: 12611807
Bioinformatics. 2017 Apr 1;33(7):971-979
pubmed: 28073754
Sci Rep. 2017 May 04;7:46787
pubmed: 28471457
Bioinformatics. 2014 Jul 15;30(14):2000-8
pubmed: 24828656
J Theor Biol. 2016 Oct 7;406:105-15
pubmed: 27375218
Bioinformatics. 2007 Jul 1;23(13):i249-55
pubmed: 17646303
J Mol Biol. 1970 Mar;48(3):443-53
pubmed: 5420325
SAR QSAR Environ Res. 2013;24(7):597-609
pubmed: 23710804
Bioinformatics. 2004 Dec 12;20(18):3455-61
pubmed: 15271780
Mol Biol Evol. 2001 Apr;18(4):639-47
pubmed: 11264416
J Mol Biol. 1982 Dec 15;162(3):705-8
pubmed: 7166760
Genomics. 2017 Mar;109(2):123-130
pubmed: 27974244
PLoS One. 2016 Dec 5;11(12):e0167430
pubmed: 27918587
Chin Sci Bull. 2003;48(12):1170-1174
pubmed: 32214701
Genomics. 2019 Jul;111(4):549-559
pubmed: 29545002
Science. 1988 Jun 3;240(4857):1285-93
pubmed: 3287615