Evidence for the preferential reuse of sub-domain motifs in primordial protein folds.


Journal

Proteins
ISSN: 1097-0134
Titre abrégé: Proteins
Pays: United States
ID NLM: 8700181

Informations de publication

Date de publication:
09 2021
Historique:
revised: 15 04 2021
received: 04 01 2021
accepted: 28 04 2021
pubmed: 7 5 2021
medline: 5 2 2022
entrez: 6 5 2021
Statut: ppublish

Résumé

A comparison of protein backbones makes clear that not more than approximately 1400 different folds exist, each specifying the three-dimensional topology of a protein domain. Large proteins are composed of specific domain combinations and many domains can accommodate different functions. These findings confirm that the reuse of domains is key for the evolution of multi-domain proteins. If reuse was also the driving force for domain evolution, ancestral fragments of sub-domain size exist that are shared between domains possessing significantly different topologies. For the fully automated detection of putatively ancestral motifs, we developed the algorithm Fragstatt that compares proteins pairwise to identify fragments, that is, instantiations of the same motif. To reach maximal sensitivity, Fragstatt compares sequences by means of cascaded alignments of profile Hidden Markov Models. If the fragment sequences are sufficiently similar, the program determines and scores the structural concordance of the fragments. By analyzing a comprehensive set of proteins from the CATH database, Fragstatt identified 12 532 partially overlapping and structurally similar motifs that clustered to 134 unique motifs. The dissemination of these motifs is limited: We found only two domain topologies that contain two different motifs and generally, these motifs occur in not more than 18% of the CATH topologies. Interestingly, motifs are enriched in topologies that are considered ancestral. Thus, our findings suggest that the reuse of sub-domain sized fragments was relevant in early phases of protein evolution and became less important later on.

Identifiants

pubmed: 33957009
doi: 10.1002/prot.26089
doi:

Substances chimiques

Amino Acids 0
Proteins 0

Types de publication

Historical Article Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

1167-1179

Informations de copyright

© 2021 The Authors. Proteins: Structure, Function, and Bioinformatics published by Wiley Periodicals LLC.

Références

Cunningham F, Achuthan P, Akanni W, et al. Ensembl 2019. Nucleic Acids Res. 2019;47(D1):D745-D751.
Bairoch A. The ENZYME database in 2000. Nucleic Acids Res. 2000;28(1):304-305.
Goodsell DS, Olson AJ. Structural symmetry and protein function. Annu Rev Biophys Biomol Struct. 2000;29:105-153.
Levy ED, Boeri Erba E, Robinson CV, Teichmann SA. Assembly reflects evolution of protein complexes. Nature. 2008;453(7199):1262-1265.
Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol. 1995;247:536-540.
Chothia C. Proteins. One thousand families for the molecular biologist. Nature. 1992;357(6379):543-544.
Andreeva A, Kulesha E, Gough J, Murzin AG. The SCOP database in 2020: expanded classification of representative family and superfamily domains of known protein structures. Nucleic Acids Res. 2020;48(D1):D376-D382.
Pearson WR. An introduction to sequence similarity ("homology") searching. Curr Protoc Bioinformatics. 2013;42. https://doi.org/10.1002/0471250953.bi0301s42.
Zhang Y, Skolnick J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 2005;33(7):2302-2309.
Doolittle RF. The multiplicity of domains in proteins. Annu Rev Biochem. 1995;64:287-314.
El-Gebali S, Mistry J, Bateman A, et al. The Pfam protein families database in 2019. Nucleic Acids Res. 2019;47(D1):D427-D432.
Koonin EV, Aravind L, Kondrashov AS. The impact of comparative genomics on our understanding of evolution. Cell. 2000;101(6):573-576.
Chothia C, Gough J, Vogel C, Teichmann SA. Evolution of the protein repertoire. Science. 2003;300(5626):1701-1703.
Bornberg-Bauer E, Beaussart F, Kummerfeld SK, Teichmann SA, Weiner J 3rd. The evolution of domain arrangements in proteins and interaction networks. Cell Mol Life Sci. 2005;62(4):435-445.
Apic G, Russell RB. Domain recombination: a workhorse for evolutionary innovation. Sci Signal. 2010;3(139):pe30.
Jacob F. Evolution and tinkering. Science. 1977;196(4295):1161-1166.
Eck RV, Dayhoff MO. Evolution of the structure of ferredoxin based on living relics of primitive amino acid sequences. Science. 1966;152(3720):363-366.
Berezovsky IN, Grosberg AY, Trifonov EN. Closed loops of nearly standard size: common basic element of protein structure. FEBS Lett. 2000;466(2-3):283-286.
Berezovsky IN, Guarnera E, Zheng Z. Basic units of protein structure, folding, and function. Prog Biophys Mol Biol. 2017;128:85-99.
Zheng Z, Goncearenco A, Berezovsky IN. Nucleotide binding database NBDB-a collection of sequence motifs with specific protein-ligand interactions. Nucleic Acids Res. 2016;44(D1):D301-D307.
Xu J, Zhang Y. How significant is a protein structure similarity with TM-score = 0.5? Bioinformatics. 2010;26(7):889-895.
Eddy SR. Profile hidden Markov models. Bioinformatics. 1998;14(9):755-763.
Alva V, Söding J, Lupas AN. A vocabulary of ancient peptides at the origin of folded proteins. Elife. 2015;4:e09410.
Nepomnyachiy S, Ben-Tal N, Kolodny R. Complex evolutionary footprints revealed in an analysis of reused protein segments of diverse lengths. Proc Natl Acad Sci U S A. 2017;114(44):11703-11708.
Farías-Rico JA, Schmidt S, Höcker B. Evolutionary relationship of two ancient protein superfolds. Nat Chem Biol. 2014;10(9):710-715.
Ferruz N, Lobos F, Lemm D, et al. Identification and analysis of natural building blocks for evolution-guided fragment-based protein design. J Mol Biol. 2020;432(13):3898-3914.
Kolodny R, Nepomnyachiy S, Tawfik DS, Ben-Tal N. Bridging themes: short protein segments found in different architectures. Mol Biol Evol. 2021. https://doi.org/10.1093/molbev/msab017.
Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol. 1981;147(1):195-197.
Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22(13):1658-1659.
Hahsler M, Piekenbrock M, Doran D. Dbscan: fast density-based clustering with R. J Stat Softw. 2019;91(1):1-30.
Chavent M. A Hausdorff distance between hyper-rectangles for clustering interval data. In: Banks D, McMorris R, Arabie G, eds. Classification, Clustering, and Data Mining Applications. Studies in Classification, Data Analysis, and Knowledge Organisation. Berlin: Springer; 2004.
Remmert M, Biegert A, Hauser A, Söding J. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods. 2011;9(2):173-175.
Lopez G, Maietta P, Rodriguez JM, Valencia A, Tress ML. Firestar-advances in the prediction of functionally important residues. Nucleic Acids Res. 2011;39. https://doi.org/10.1093/nar/gkr437.
Alva V, Remmert M, Biegert A, Lupas AN, Söding J. A galaxy of folds. Protein Sci. 2010;19(1):124-130.
Kaushik S, Nair AG, Mutt E, Subramanian HP, Sowdhamini R. Rapid and enhanced remote homology detection by cascading hidden Markov model searches in sequence space. Bioinformatics. 2016;32(3):338-344.
Berman HM, Westbrook J, Feng Z, et al. The protein data Bank. Nucleic Acids Res. 2000;28(1):235-242.
Sillitoe I, Dawson N, Lewis TE, et al. CATH: expanding the horizons of structure-based functional annotations for genome sequences. Nucleic Acids Res. 2019;47(D1):D280-D284.
Greene LH, Lewis TE, Addou S, et al. The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution. Nucleic Acids Res. 2007;35:D291-D297.
Afanasieva E, Chaudhuri I, Martin J, et al. Structural diversity of oligomeric beta-propellers with different numbers of identical blades. Elife. 2019;8. https://doi.org/10.7554/eLife.49853.
Kopec KO, Lupas AN. Beta-propeller blades as ancestral peptides in protein evolution. PLoS One. 2013;8(10):e77074.
Chaudhuri I, Söding J, Lupas AN. Evolution of the β-propeller fold. Proteins. 2008;71(2):795-803.
Edwards H, Abeln S, Deane CM. Exploring fold space preferences of new-born and ancient protein superfamilies. PLoS Comp Biol. 2013;9(11):e1003325.
Winstanley HF, Abeln S, Deane CM. How old is your fold? Bioinformatics. 2005;21(Suppl 1):i449-i458.
Lewis TE, Sillitoe I, Andreeva A, et al. Genome3D: exploiting structure to help users understand their sequences. Nucleic Acids Res. 2015;43:D382-D386.
Tóth-Petróczy A, Tawfik DS. The robustness and innovability of protein folds. Curr Opin Struct Biol. 2014;26:131-138.
Li H, Helling R, Tang C, Wingreen N. Emergence of preferred structures in a simple model of protein folding. Science. 1996;273(5275):666-669.
Goncearenco A, Berezovsky IN. Protein function from its emergence to diversity in contemporary proteins. Phys Biol. 2015;12(4):045002.
Yang J, Roy A, Zhang Y. BioLiP: a semi-manually curated database for biologically relevant ligand-protein interactions. Nucleic Acids Res. 2013;41:D1096-D1103.
Lee J, Blaber M. Experimental support for the evolution of symmetric protein architecture from a simple peptide motif. Proc Natl Acad Sci U S A. 2011;108(1):126-130.
He X, Zhang J. Rapid subfunctionalization accompanied by prolonged and substantial neofunctionalization in duplicate gene evolution. Genetics. 2005;169(2):1157-1164.
Magadum S, Banerjee U, Murugan P, Gangapur D, Ravikesavan R. Gene duplication as a major force in evolution. J Genet. 2013;92(1):155-161.
Richter M, Bosnali M, Carstensen L, et al. Computational and experimental evidence for the evolution of a (βα)8-barrel protein from an ancestral quarter-barrel stabilised by disulfide bonds. J Mol Biol. 2010;398(5):763-773.
Broom A, Doxey AC, Lobsanov YD, et al. Modular evolution and the origins of symmetry: reconstruction of a three-fold symmetric globular protein. Structure. 2012;20(1):161-171.
Main ER, Lowe AR, Mochrie SG, Jackson SE, Regan L. A recurring theme in protein engineering: the design, stability and folding of repeat proteins. Curr Opin Struct Biol. 2005;15(4):464-471.
Söding J, Lupas AN. More than the sum of their parts: on the evolution of proteins from peptides. Bioessays. 2003;25(9):837-846.
Caetano-Anollés G, Wang M, Caetano-Anollés D, Mittenthal JE. The origin, evolution and structure of the protein world. Biochem J. 2009;417(3):621-637.
Aharonovsky E, Trifonov EN. Sequence structure of van der Waals locks in proteins. J Biomol Struct Dyn. 2005;22(5):545-553.
Goncearenco A, Berezovsky IN. Prototypes of elementary functional loops unravel evolutionary connections between protein functions. Bioinformatics. 2010;26(18):i497-i503.
Weisman CM, Eddy SR. Gene evolution: getting something from nothing. Curr Biol. 2017;27(13):R661-R663.
Carvunis AR, Rolland T, Wapinski I, et al. Proto-genes and de novo gene birth. Nature. 2012;487(7407):370-374.
Schmitz JF, Bornberg-Bauer E. Fact or fiction: updates on how protein-coding genes might emerge de novo from previously non-coding DNA. F1000Res. 2017;6:57.
Tretyachenko V, Vymětal J, Bednárová L, et al. Random protein sequences can form defined secondary structures and are well-tolerated in vivo. Sci Rep. 2017;7(1):15449.

Auteurs

Leonhard Heizinger (L)

Institute of Biophysics and Physical Biochemistry, University of Regensburg, Regensburg, Germany.

Rainer Merkl (R)

Institute of Biophysics and Physical Biochemistry, University of Regensburg, Regensburg, Germany.

Articles similaires

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages
Databases, Protein Protein Domains Protein Folding Proteins Deep Learning

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software
Biological Evolution History, 20th Century Selection, Genetic History, 19th Century Biology

Classifications MeSH