The importance of definitions in the study of polyQ regions: A tale of thresholds, impurities and sequence context.

Codon usage Glutamine Homorepeat Sequence context polyQ

Journal

Computational and structural biotechnology journal
ISSN: 2001-0370
Titre abrégé: Comput Struct Biotechnol J
Pays: Netherlands
ID NLM: 101585369

Informations de publication

Date de publication:
2020
Historique:
received: 23 09 2019
revised: 13 12 2019
accepted: 30 01 2020
entrez: 20 2 2020
pubmed: 20 2 2020
medline: 20 2 2020
Statut: epublish

Résumé

Polyglutamine (polyQ) regions are one of the most prevalent homorepeats in eukaryotes. It is however difficult to evaluate their prevalence because various studies claim different results. The reason is the lack of a consensus to define what is indeed a polyQ region. We have tackled this issue by studying how the use of different thresholds (i.e., minimum number of glutamines required in a protein region of a given size), to detect polyQ regions in the human proteome influences not only their prevalence but also their general features and sequence context. Threshold definition shapes the length distribution of the polyQ dataset, and changes the observed number and position of impurities (amino acids other than glutamine) within polyQ regions. Irrespective of the chosen threshold, leucine and proline residues are enriched both within and around polyQ. While leucine is enriched at the N-terminus of polyQ and specially at position -1 (amino acid preceding the polyQ), proline is prevalent in the C-terminus (positions +1 to +5, that is, the first five amino acids after the polyQ). We also checked the suitability of these thresholds for other species, and compared their polyQ features with those found in humans. As the sequence context and features of polyQ regions are threshold-dependent, we propose a method to quickly scan the polyQ landscape of a proteome. We complement our results with a summarized overview about which biases are to be expected per threshold when studying polyQ regions.

Identifiants

pubmed: 32071707
doi: 10.1016/j.csbj.2020.01.012
pii: S2001-0370(19)30360-5
pmc: PMC7016039
doi:

Types de publication

Journal Article

Langues

eng

Pagination

306-313

Informations de copyright

© 2020 The Authors.

Références

Mol Cell. 2016 Apr 21;62(2):272-283
pubmed: 27151442
Hum Mol Genet. 2007 Oct 15;16 Spec No. 2:R115-23
pubmed: 17911155
Methods Mol Biol. 2006;346:15-30
pubmed: 16957282
FEMS Microbiol Rev. 2017 Nov 1;41(6):923-940
pubmed: 29077880
Annu Rev Neurosci. 2007;30:575-621
pubmed: 17417937
J Mol Biol. 2007 Nov 30;374(3):688-704
pubmed: 17945257
Nat Struct Mol Biol. 2009 Dec;16(12):1279-85
pubmed: 19915590
Biophys J. 2016 Jun 7;110(11):2361-2366
pubmed: 27276254
Biochemistry. 2017 Mar 7;56(9):1199-1217
pubmed: 28170216
Genome Res. 2004 Apr;14(4):549-54
pubmed: 15059995
Nat Commun. 2017 May 24;8:15462
pubmed: 28537272
Trends Cell Biol. 2013 Apr;23(4):168-74
pubmed: 23228508
Mol Biosyst. 2012 Jan;8(1):327-37
pubmed: 22009164
Nat Struct Mol Biol. 2009 Apr;16(4):380-9
pubmed: 19270701
Nucleic Acids Res. 2012 May;40(10):4273-87
pubmed: 22287626
Genome Res. 2005 Apr;15(4):537-51
pubmed: 15805494
Adv Protein Chem Struct Biol. 2010;79:59-88
pubmed: 20621281
PLoS One. 2018 Nov 6;13(11):e0206941
pubmed: 30399196
Genome Biol Evol. 2018 Mar 1;10(3):816-825
pubmed: 29608721
Mol Biol Evol. 2001 Jul;18(7):1161-7
pubmed: 11420357
Proteins. 2017 Apr;85(4):709-719
pubmed: 28097686
Hum Mol Genet. 2004 Oct 1;13 Spec No 2:R235-43
pubmed: 15358730
Molecules. 2017 Nov 24;22(12):
pubmed: 29186753
Nat Struct Mol Biol. 2017 Sep;24(9):765-777
pubmed: 28805808
Nat Commun. 2019 May 2;10(1):2034
pubmed: 31048691
PLoS One. 2017 Jan 26;12(1):e0170801
pubmed: 28125688
Proc Natl Acad Sci U S A. 2002 Jan 8;99(1):333-8
pubmed: 11782551
Trends Neurosci. 2008 Oct;31(10):521-8
pubmed: 18778858
Lancet. 2007 Jan 20;369(9557):218-28
pubmed: 17240289
Elife. 2016 Oct 18;5:
pubmed: 27751235
PLoS One. 2012;7(2):e30824
pubmed: 22312432
Methods Mol Biol. 2013;1017:135-51
pubmed: 23719913
Cell. 2003 Dec 26;115(7):771-85
pubmed: 14697197
J Mol Biol. 2006 Jan 20;355(3):524-35
pubmed: 16321399

Auteurs

Pablo Mier (P)

Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany.

Carlos Elena-Real (C)

Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 29, rue de Navacelles, 34090 Montpellier, France.

Annika Urbanek (A)

Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 29, rue de Navacelles, 34090 Montpellier, France.

Pau Bernadó (P)

Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 29, rue de Navacelles, 34090 Montpellier, France.

Miguel A Andrade-Navarro (MA)

Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz, Hans-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany.

Classifications MeSH