SeqCP: A sequence-based algorithm for searching circularly permuted proteins.

AUC, area under the ROC curve CE, combinatorial extension CE-CP, CE with Circular Permutations CP, circular permutation CPDB, Circular Permutation Database CPMs, circular permutants CPSARST, Circular Permutation Search Aided by Ramachandran Sequential Transformation Circular permutants Circular permutation MCC, Matthews correlation coefficient Protein sequence analysis Protein structure modeling RMSD, root-mean-square distance ROC, receiver operating characteristic

Journal

Computational and structural biotechnology journal
ISSN: 2001-0370
Titre abrégé: Comput Struct Biotechnol J
Pays: Netherlands
ID NLM: 101585369

Informations de publication

Date de publication:
2023
Historique:
received: 21 06 2022
revised: 10 11 2022
accepted: 10 11 2022
entrez: 30 12 2022
pubmed: 31 12 2022
medline: 31 12 2022
Statut: epublish

Résumé

Circular permutation (CP) is a protein sequence rearrangement in which the amino- and carboxyl-termini of a protein can be created in different positions along the imaginary circularized sequence. Circularly permutated proteins usually exhibit conserved three-dimensional structures and functions. By comparing the structures of circular permutants (CPMs), protein research and bioengineering applications can be approached in ways that are difficult to achieve by traditional mutagenesis. Most current CP detection algorithms depend on structural information. Because there is a vast number of proteins with unknown structures, many CP pairs may remain unidentified. An efficient sequence-based CP detector will help identify more CP pairs and advance many protein studies. For instance, some hypothetical proteins may have CPMs with known functions and structures that are informative for functional annotation, but existing structure-based CP search methods cannot be applied when those hypothetical proteins lack structural information. Despite the considerable potential for applications, sequence-based CP search methods have not been well developed. We present a sequence-based method, SeqCP, which analyzes normal and duplicated sequence alignments to identify CPMs and determine candidate CP sites for proteins. SeqCP was trained by data obtained from the Circular Permutation Database and tested with nonredundant datasets from the Protein Data Bank. It shows high reliability in CP identification and achieves an AUC of 0.9. SeqCP has been implemented into a web server available at: http://pcnas.life.nthu.edu.tw/SeqCP/.

Identifiants

pubmed: 36582435
doi: 10.1016/j.csbj.2022.11.024
pii: S2001-0370(22)00518-9
pmc: PMC9763678
doi:

Types de publication

Journal Article

Langues

eng

Pagination

185-201

Informations de copyright

© 2022 The Authors.

Déclaration de conflit d'intérêts

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Références

Bioinformatics. 2015 Mar 15;31(6):926-32
pubmed: 25398609
BMC Bioinformatics. 2021 Oct 12;22(Suppl 10):494
pubmed: 34641789
Biochim Biophys Acta. 1975 Oct 20;405(2):442-51
pubmed: 1180967
Nature. 1985 Jan 3-9;313(5997):64-7
pubmed: 3965973
Proc Natl Acad Sci U S A. 1999 Sep 28;96(20):11241-6
pubmed: 10500161
J Cell Biol. 1986 Apr;102(4):1284-97
pubmed: 3958046
Methods Mol Biol. 2017;1495:259-268
pubmed: 27714622
Protein Eng. 2001 Aug;14(8):533-42
pubmed: 11579221
Proteins. 2010 May 15;78(7):1618-30
pubmed: 20112421
Protein Sci. 2001 Sep;10(9):1881-6
pubmed: 11514678
Brief Bioinform. 2002 Sep;3(3):246-51
pubmed: 12230033
Nucleic Acids Res. 2009 Jan;37(Database issue):D328-32
pubmed: 18842637
Proteins. 1998 Feb 1;30(2):155-67
pubmed: 9489923
Prog Biophys Mol Biol. 1995;64(2-3):121-43
pubmed: 8987381
Bioinformatics. 2007 May 15;23(10):1282-8
pubmed: 17379688
Nature. 2019 Sep;573(7773):291-295
pubmed: 31462775
J Mol Biol. 2006 Apr 21;358(1):280-8
pubmed: 16510154
Protein Sci. 1993 May;2(5):697-705
pubmed: 8495192
Protein Sci. 2008 Aug;17(8):1374-82
pubmed: 18583523
Cold Spring Harb Symp Quant Biol. 1987;52:907-13
pubmed: 2456888
Bioinformatics. 2015 Apr 15;31(8):1316-8
pubmed: 25505094
J Am Chem Soc. 2005 Oct 5;127(39):13466-7
pubmed: 16190688
Science. 1985 Mar 22;227(4693):1435-41
pubmed: 2983426
Nat Methods. 2022 Jun;19(6):679-682
pubmed: 35637307
PLoS One. 2012;7(8):e43820
pubmed: 22937103
PLoS One. 2021 Jul 28;16(7):e0255076
pubmed: 34320027
Proc Natl Acad Sci U S A. 1988 Apr;85(8):2444-8
pubmed: 3162770
Nucleic Acids Res. 2019 Jan 8;47(D1):D464-D474
pubmed: 30357411
Trends Biochem Sci. 2002 Aug;27(8):419-26
pubmed: 12151227
Nucleic Acids Res. 2000 Jan 1;28(1):235-42
pubmed: 10592235
Bioinformatics. 1999 Nov;15(11):930-6
pubmed: 10743559
Nature. 2021 Aug;596(7873):583-589
pubmed: 34265844
PLoS One. 2020 Jun 30;15(6):e0235153
pubmed: 32603341
Nucleic Acids Res. 2022 Jan 7;50(D1):D439-D444
pubmed: 34791371
J Mol Biol. 1970 Mar;48(3):443-53
pubmed: 5420325
Nucleic Acids Res. 2022 Jan 7;50(D1):D20-D26
pubmed: 34850941
Bioinformatics. 2005 Apr 1;21(7):932-7
pubmed: 15788783
Science. 1989 Jan 13;243(4888):206-10
pubmed: 2643160
Trends Biochem Sci. 1995 May;20(5):179-80
pubmed: 7610480
Nucleic Acids Res. 2015 Jul 1;43(W1):W338-42
pubmed: 25943546
Cold Spring Harb Symp Quant Biol. 1987;52:901-5
pubmed: 2456887
Nucleic Acids Res. 2012 Jul;40(Web Server issue):W232-7
pubmed: 22693212
BMC Bioinformatics. 2007 Aug 23;8:307
pubmed: 17716377
Structure. 2021 Aug 5;29(8):873-885.e5
pubmed: 33784495
Protein Eng. 1998 Sep;11(9):739-47
pubmed: 9796821
Curr Opin Struct Biol. 1997 Jun;7(3):422-7
pubmed: 9204286
Int J Mol Sci. 2019 Aug 27;20(17):
pubmed: 31461959
Nat Genet. 2006 Feb;38(2):168-74
pubmed: 16415885
Genome Biol. 2008 Jan 18;9(1):R11
pubmed: 18201387
J Biosci Bioeng. 2005 Aug;100(2):197-202
pubmed: 16198264
Protein Eng Des Sel. 2005 Aug;18(8):359-64
pubmed: 16043448
J Biol Chem. 1999 Jul 2;274(27):19041-7
pubmed: 10383405
Bioessays. 2006 Oct;28(10):973-8
pubmed: 16998824
Structure. 2019 Aug 6;27(8):1224-1233.e4
pubmed: 31104814
Nucleic Acids Res. 2018 Jul 2;46(W1):W296-W303
pubmed: 29788355
Proc Natl Acad Sci U S A. 1979 Jul;76(7):3218-22
pubmed: 16592676
Protein Eng. 2000 Aug;13(8):535-43
pubmed: 10964982
J Mol Biol. 1983 Apr 5;165(2):407-13
pubmed: 6188846
Nat Biotechnol. 2006 Mar;24(3):328-30
pubmed: 16525408
Fold Des. 1997;2(3):S19-24
pubmed: 9218962
Nucleic Acids Res. 1997 Sep 1;25(17):3389-402
pubmed: 9254694
J Mol Evol. 1999 Jul;49(1):161-4
pubmed: 10368444
J Mol Biol. 1999 Mar 5;286(4):1197-215
pubmed: 10047491
J Mol Biol. 1990 Oct 5;215(3):403-10
pubmed: 2231712
PLoS Comput Biol. 2012;8(3):e1002445
pubmed: 22496628

Auteurs

Chi-Chun Chen (CC)

Bioinformatics Program, Institute of Information Science, Taiwan International Graduate Program, Academia Sinica, Taipei 115, Taiwan.
Institute of Bioinformatics and Structural Biology, National Tsing Hua University, Hsinchu 300, Taiwan.

Yu-Wei Huang (YW)

Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan.

Hsuan-Cheng Huang (HC)

Bioinformatics Program, Institute of Information Science, Taiwan International Graduate Program, Academia Sinica, Taipei 115, Taiwan.
Institute of Biomedical Informatics, National Yang Ming Chiao Tung University, Taipei 112, Taiwan.

Wei-Cheng Lo (WC)

Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan.
Department of Biological Science and Technology, National Yang Ming Chiao Tung University, Hsinchu 300, Taiwan.
The Center for Bioinformatics Research, National Yang Ming Chiao Tung University, Hsinchu, Taiwan.

Ping-Chiang Lyu (PC)

Institute of Bioinformatics and Structural Biology, National Tsing Hua University, Hsinchu 300, Taiwan.

Classifications MeSH