Assessing the low complexity of protein sequences via the low complexity triangle.
Journal
PloS one
ISSN: 1932-6203
Titre abrégé: PLoS One
Pays: United States
ID NLM: 101285081
Informations de publication
Date de publication:
2020
2020
Historique:
received:
24
04
2020
accepted:
31
08
2020
entrez:
30
12
2020
pubmed:
31
12
2020
medline:
16
1
2021
Statut:
epublish
Résumé
Proteins with low complexity regions (LCRs) have atypical sequence and structural features. Their amino acid composition varies from the expected, determined proteome-wise, and they do not follow the rules of structural folding that prevail in globular regions. One way to characterize these regions is by assessing the repeatability of a sequence, that is, calculating the local propensity of a region to be part of a repeat. We combine two local measures of low complexity, repeatability (using the RES algorithm) and fraction of the most frequent amino acid, to evaluate different proteomes, datasets of protein regions with specific features, and individual cases of proteins with extreme compositions. We apply a representation called 'low complexity triangle' as a proof-of-concept to represent the low complexity measured values. Results show that proteomes have distinct signatures in the low complexity triangle, and that these signatures are associated to complexity features of the sequences. We developed a web tool called LCT (http://cbdm-01.zdv.uni-mainz.de/~munoz/lct/) to allow users to calculate the low complexity triangle of a given protein or region of interest. The low complexity triangle proves to be a suitable procedure to represent the general low complexity of a sequence or protein dataset. Homorepeats, direpeats, compositionally biased regions and globular regions occupy characteristic positions in the triangle. The described pipeline can be used to characterize LCRs and may help in quantifying the content of degenerated tandem repeats in proteins and proteomes.
Sections du résumé
BACKGROUND
Proteins with low complexity regions (LCRs) have atypical sequence and structural features. Their amino acid composition varies from the expected, determined proteome-wise, and they do not follow the rules of structural folding that prevail in globular regions. One way to characterize these regions is by assessing the repeatability of a sequence, that is, calculating the local propensity of a region to be part of a repeat.
RESULTS
We combine two local measures of low complexity, repeatability (using the RES algorithm) and fraction of the most frequent amino acid, to evaluate different proteomes, datasets of protein regions with specific features, and individual cases of proteins with extreme compositions. We apply a representation called 'low complexity triangle' as a proof-of-concept to represent the low complexity measured values. Results show that proteomes have distinct signatures in the low complexity triangle, and that these signatures are associated to complexity features of the sequences. We developed a web tool called LCT (http://cbdm-01.zdv.uni-mainz.de/~munoz/lct/) to allow users to calculate the low complexity triangle of a given protein or region of interest.
CONCLUSIONS
The low complexity triangle proves to be a suitable procedure to represent the general low complexity of a sequence or protein dataset. Homorepeats, direpeats, compositionally biased regions and globular regions occupy characteristic positions in the triangle. The described pipeline can be used to characterize LCRs and may help in quantifying the content of degenerated tandem repeats in proteins and proteomes.
Identifiants
pubmed: 33378336
doi: 10.1371/journal.pone.0239154
pii: PONE-D-20-11896
pmc: PMC7773278
doi:
Substances chimiques
Proteome
0
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
e0239154Déclaration de conflit d'intérêts
The authors have declared that no competing interests exist.
Références
Database (Oxford). 2011 Jan 06;2011:baq031
pubmed: 21216786
Nucleic Acids Res. 2014 Jan;42(Database issue):D273-8
pubmed: 24150944
BMC Syst Biol. 2010 Apr 13;4:43
pubmed: 20385029
Structure. 2020 Jul 7;28(7):733-746.e5
pubmed: 32402249
Nucleic Acids Res. 2019 Nov 4;47(19):9998-10009
pubmed: 31504783
Bioinformatics. 2000 Oct;16(10):915-22
pubmed: 11120681
Adv Protein Chem Struct Biol. 2010;79:59-88
pubmed: 20621281
Bioinformatics. 2017 Apr 15;33(8):1221-1223
pubmed: 28031183
Molecules. 2017 Nov 24;22(12):
pubmed: 29186753
Brief Bioinform. 2020 Mar 23;21(2):458-472
pubmed: 30698641
J Struct Biol. 2019 Nov 1;208(2):86-91
pubmed: 31408700
Bioinformatics. 2015 Jul 1;31(13):2208-10
pubmed: 25712690
Nucleic Acids Res. 2020 Jul 2;48(W1):W77-W84
pubmed: 32421769
Proc Natl Acad Sci U S A. 2002 Jan 8;99(1):333-8
pubmed: 11782551
Nature. 2005 May 5;435(7038):43-57
pubmed: 15875012
J Mol Biol. 2003 Mar 21;327(2):507-20
pubmed: 12628254
Nucleic Acids Res. 2000 Jan 1;28(1):235-42
pubmed: 10592235
BMC Bioinformatics. 2017 Nov 13;18(1):476
pubmed: 29132292
J Mol Biol. 1963 Jul;7:95-9
pubmed: 13990617
Proteins. 2001 Jan 1;42(1):38-48
pubmed: 11093259
Genome Biol. 2009;10(6):R59
pubmed: 19486509
Chembiochem. 2004 Feb 6;5(2):170-6
pubmed: 14760737
Proteins. 2017 Apr;85(4):709-719
pubmed: 28097686
Nucleic Acids Res. 2014 Jan;42(Database issue):D352-7
pubmed: 24311564
Trends Biochem Sci. 2002 Oct;27(10):527-33
pubmed: 12368089
J Virol. 2013 Aug;87(16):9159-72
pubmed: 23760243
J Struct Biol. 2012 Sep;179(3):279-88
pubmed: 21884799
Bioinformatics. 2020 May 1;36(10):3260-3262
pubmed: 32096820
Nat Genet. 1994 Aug;7(4):513-20
pubmed: 7951322
Nucleic Acids Res. 2018 Jan 4;46(D1):D471-D476
pubmed: 29136219
Mol Biosyst. 2015 Feb;11(2):585-94
pubmed: 25468592
Eur Biophys J. 2013 Mar;42(2-3):199-207
pubmed: 22588483
Nucleic Acids Res. 2020 Jan 8;48(D1):D269-D276
pubmed: 31713636
Protein Sci. 2002 Apr;11(4):739-56
pubmed: 11910019
Mol Biosyst. 2012 Jan;8(1):327-37
pubmed: 22009164
Bioinformatics. 2002 May;18(5):672-8
pubmed: 12050063
Biochemistry. 2002 May 28;41(21):6573-82
pubmed: 12022860
Cell Mol Life Sci. 2015 Jan;72(1):137-51
pubmed: 24939692