The importance of definitions in the study of polyQ regions: A tale of thresholds, impurities and sequence context.

Codon usage Glutamine Homorepeat Sequence context polyQ

Journal

Computational and structural biotechnology journal

ISSN: 2001-0370

Titre abrégé: Comput Struct Biotechnol J

Pays: Netherlands

ID NLM: 101585369

Informations de publication

Date de publication:
2020

Historique:

received: 23 09 2019

revised: 13 12 2019

accepted: 30 01 2020

entrez: 20 2 2020

pubmed: 20 2 2020

medline: 20 2 2020

Statut: epublish

Résumé

Polyglutamine (polyQ) regions are one of the most prevalent homorepeats in eukaryotes. It is however difficult to evaluate their prevalence because various studies claim different results. The reason is the lack of a consensus to define what is indeed a polyQ region. We have tackled this issue by studying how the use of different thresholds (i.e., minimum number of glutamines required in a protein region of a given size), to detect polyQ regions in the human proteome influences not only their prevalence but also their general features and sequence context. Threshold definition shapes the length distribution of the polyQ dataset, and changes the observed number and position of impurities (amino acids other than glutamine) within polyQ regions. Irrespective of the chosen threshold, leucine and proline residues are enriched both within and around polyQ. While leucine is enriched at the N-terminus of polyQ and specially at position -1 (amino acid preceding the polyQ), proline is prevalent in the C-terminus (positions +1 to +5, that is, the first five amino acids after the polyQ). We also checked the suitability of these thresholds for other species, and compared their polyQ features with those found in humans. As the sequence context and features of polyQ regions are threshold-dependent, we propose a method to quickly scan the polyQ landscape of a proteome. We complement our results with a summarized overview about which biases are to be expected per threshold when studying polyQ regions.

Identifiants

DOI: 10.1016/j.csbj.2020.01.012 PMID: 32071707 PMC: PMC7016039

pubmed: 32071707

doi: 10.1016/j.csbj.2020.01.012

pii: S2001-0370(19)30360-5

pmc: PMC7016039

doi:

Types de publication

Journal Article

Langues

eng

Pagination

306-313

Informations de copyright

Références

Mol Cell. 2016 Apr 21;62(2):272-283

pubmed: 27151442

Hum Mol Genet. 2007 Oct 15;16 Spec No. 2:R115-23

pubmed: 17911155

Methods Mol Biol. 2006;346:15-30

pubmed: 16957282

FEMS Microbiol Rev. 2017 Nov 1;41(6):923-940

pubmed: 29077880

Annu Rev Neurosci. 2007;30:575-621

pubmed: 17417937

J Mol Biol. 2007 Nov 30;374(3):688-704

pubmed: 17945257

Nat Struct Mol Biol. 2009 Dec;16(12):1279-85

pubmed: 19915590

Biophys J. 2016 Jun 7;110(11):2361-2366

pubmed: 27276254

Biochemistry. 2017 Mar 7;56(9):1199-1217

pubmed: 28170216

Genome Res. 2004 Apr;14(4):549-54

pubmed: 15059995

Nat Commun. 2017 May 24;8:15462

pubmed: 28537272

Trends Cell Biol. 2013 Apr;23(4):168-74

pubmed: 23228508

Mol Biosyst. 2012 Jan;8(1):327-37

pubmed: 22009164

Nat Struct Mol Biol. 2009 Apr;16(4):380-9

pubmed: 19270701

Nucleic Acids Res. 2012 May;40(10):4273-87

pubmed: 22287626

Genome Res. 2005 Apr;15(4):537-51

pubmed: 15805494

Adv Protein Chem Struct Biol. 2010;79:59-88

pubmed: 20621281

PLoS One. 2018 Nov 6;13(11):e0206941

pubmed: 30399196

Genome Biol Evol. 2018 Mar 1;10(3):816-825

pubmed: 29608721

Mol Biol Evol. 2001 Jul;18(7):1161-7

pubmed: 11420357

Proteins. 2017 Apr;85(4):709-719

pubmed: 28097686

Hum Mol Genet. 2004 Oct 1;13 Spec No 2:R235-43

pubmed: 15358730

Molecules. 2017 Nov 24;22(12):

pubmed: 29186753

Nat Struct Mol Biol. 2017 Sep;24(9):765-777

pubmed: 28805808

Nat Commun. 2019 May 2;10(1):2034

pubmed: 31048691

PLoS One. 2017 Jan 26;12(1):e0170801

pubmed: 28125688

Proc Natl Acad Sci U S A. 2002 Jan 8;99(1):333-8

pubmed: 11782551

Trends Neurosci. 2008 Oct;31(10):521-8

pubmed: 18778858

Lancet. 2007 Jan 20;369(9557):218-28

pubmed: 17240289

Elife. 2016 Oct 18;5:

pubmed: 27751235

PLoS One. 2012;7(2):e30824

pubmed: 22312432

Methods Mol Biol. 2013;1017:135-51

pubmed: 23719913

Cell. 2003 Dec 26;115(7):771-85

pubmed: 14697197

J Mol Biol. 2006 Jan 20;355(3):524-35

pubmed: 16321399

The importance of definitions in the study of polyQ regions: A tale of thresholds, impurities and sequence context.

Journal

Informations de publication

Résumé

Identifiants

Types de publication

Langues

Pagination

Informations de copyright

Références

Auteurs

Pablo Mier (P)

Carlos Elena-Real (C)

Annika Urbanek (A)

Pau Bernadó (P)

Miguel A Andrade-Navarro (MA)

Classifications MeSH