Assessing the low complexity of protein sequences via the low complexity triangle.

Amino Acid Sequence Humans Proteome Repetitive Sequences, Amino Acid Sequence Analysis, Protein / methods

Journal

PloS one

ISSN: 1932-6203

Titre abrégé: PLoS One

Pays: United States

ID NLM: 101285081

Informations de publication

Date de publication:
2020

Historique:

received: 24 04 2020

accepted: 31 08 2020

entrez: 30 12 2020

pubmed: 31 12 2020

medline: 16 1 2021

Statut: epublish

Résumé

Proteins with low complexity regions (LCRs) have atypical sequence and structural features. Their amino acid composition varies from the expected, determined proteome-wise, and they do not follow the rules of structural folding that prevail in globular regions. One way to characterize these regions is by assessing the repeatability of a sequence, that is, calculating the local propensity of a region to be part of a repeat. We combine two local measures of low complexity, repeatability (using the RES algorithm) and fraction of the most frequent amino acid, to evaluate different proteomes, datasets of protein regions with specific features, and individual cases of proteins with extreme compositions. We apply a representation called 'low complexity triangle' as a proof-of-concept to represent the low complexity measured values. Results show that proteomes have distinct signatures in the low complexity triangle, and that these signatures are associated to complexity features of the sequences. We developed a web tool called LCT (http://cbdm-01.zdv.uni-mainz.de/~munoz/lct/) to allow users to calculate the low complexity triangle of a given protein or region of interest. The low complexity triangle proves to be a suitable procedure to represent the general low complexity of a sequence or protein dataset. Homorepeats, direpeats, compositionally biased regions and globular regions occupy characteristic positions in the triangle. The described pipeline can be used to characterize LCRs and may help in quantifying the content of degenerated tandem repeats in proteins and proteomes.

Sections du résumé

BACKGROUND

RESULTS

We combine two local measures of low complexity, repeatability (using the RES algorithm) and fraction of the most frequent amino acid, to evaluate different proteomes, datasets of protein regions with specific features, and individual cases of proteins with extreme compositions. We apply a representation called 'low complexity triangle' as a proof-of-concept to represent the low complexity measured values. Results show that proteomes have distinct signatures in the low complexity triangle, and that these signatures are associated to complexity features of the sequences. We developed a web tool called LCT (http://cbdm-01.zdv.uni-mainz.de/~munoz/lct/) to allow users to calculate the low complexity triangle of a given protein or region of interest.

CONCLUSIONS

The low complexity triangle proves to be a suitable procedure to represent the general low complexity of a sequence or protein dataset. Homorepeats, direpeats, compositionally biased regions and globular regions occupy characteristic positions in the triangle. The described pipeline can be used to characterize LCRs and may help in quantifying the content of degenerated tandem repeats in proteins and proteomes.

Identifiants

DOI: 10.1371/journal.pone.0239154 PMID: 33378336 PMC: PMC7773278

pubmed: 33378336

doi: 10.1371/journal.pone.0239154

pii: PONE-D-20-11896

pmc: PMC7773278

doi:

Substances chimiques

Proteome 0

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

Pagination

e0239154

Déclaration de conflit d'intérêts

The authors have declared that no competing interests exist.

Références

Database (Oxford). 2011 Jan 06;2011:baq031

pubmed: 21216786

Nucleic Acids Res. 2014 Jan;42(Database issue):D273-8

pubmed: 24150944

BMC Syst Biol. 2010 Apr 13;4:43

pubmed: 20385029

Structure. 2020 Jul 7;28(7):733-746.e5

pubmed: 32402249

Nucleic Acids Res. 2019 Nov 4;47(19):9998-10009

pubmed: 31504783

Bioinformatics. 2000 Oct;16(10):915-22

pubmed: 11120681

Adv Protein Chem Struct Biol. 2010;79:59-88

pubmed: 20621281

Bioinformatics. 2017 Apr 15;33(8):1221-1223

pubmed: 28031183

Molecules. 2017 Nov 24;22(12):

pubmed: 29186753

Brief Bioinform. 2020 Mar 23;21(2):458-472

pubmed: 30698641

J Struct Biol. 2019 Nov 1;208(2):86-91

pubmed: 31408700

Bioinformatics. 2015 Jul 1;31(13):2208-10

pubmed: 25712690

Nucleic Acids Res. 2020 Jul 2;48(W1):W77-W84

pubmed: 32421769

Proc Natl Acad Sci U S A. 2002 Jan 8;99(1):333-8

pubmed: 11782551

Nature. 2005 May 5;435(7038):43-57

pubmed: 15875012

J Mol Biol. 2003 Mar 21;327(2):507-20

pubmed: 12628254

Nucleic Acids Res. 2000 Jan 1;28(1):235-42

pubmed: 10592235

BMC Bioinformatics. 2017 Nov 13;18(1):476

pubmed: 29132292

J Mol Biol. 1963 Jul;7:95-9

pubmed: 13990617

Proteins. 2001 Jan 1;42(1):38-48

pubmed: 11093259

Genome Biol. 2009;10(6):R59

pubmed: 19486509

Chembiochem. 2004 Feb 6;5(2):170-6

pubmed: 14760737

Proteins. 2017 Apr;85(4):709-719

pubmed: 28097686

Nucleic Acids Res. 2014 Jan;42(Database issue):D352-7

pubmed: 24311564

Trends Biochem Sci. 2002 Oct;27(10):527-33

pubmed: 12368089

J Virol. 2013 Aug;87(16):9159-72

pubmed: 23760243

J Struct Biol. 2012 Sep;179(3):279-88

pubmed: 21884799

Bioinformatics. 2020 May 1;36(10):3260-3262

pubmed: 32096820

Nat Genet. 1994 Aug;7(4):513-20

pubmed: 7951322

Nucleic Acids Res. 2018 Jan 4;46(D1):D471-D476

pubmed: 29136219

Mol Biosyst. 2015 Feb;11(2):585-94

pubmed: 25468592

Eur Biophys J. 2013 Mar;42(2-3):199-207

pubmed: 22588483

Nucleic Acids Res. 2020 Jan 8;48(D1):D269-D276

pubmed: 31713636

Protein Sci. 2002 Apr;11(4):739-56

pubmed: 11910019

Mol Biosyst. 2012 Jan;8(1):327-37

pubmed: 22009164

Bioinformatics. 2002 May;18(5):672-8

pubmed: 12050063

Biochemistry. 2002 May 28;41(21):6573-82

pubmed: 12022860

Cell Mol Life Sci. 2015 Jan;72(1):137-51

pubmed: 24939692

Assessing the low complexity of protein sequences via the low complexity triangle.

Journal

Informations de publication

Résumé

Sections du résumé

Identifiants

Substances chimiques

Types de publication

Langues

Sous-ensembles de citation

Pagination

Déclaration de conflit d'intérêts

Références

Auteurs

Pablo Mier (P)

Miguel A Andrade-Navarro (MA)

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Smoking Cessation and Incident Cardiovascular Disease.

Evaluation of Low-Value Services Across Major Medicare Advantage Insurers and Traditional Medicare.

Effectiveness of Virtual Yoga for Chronic Low Back Pain: A Randomized Clinical Trial.

Classifications MeSH