Differential amino acid usage leads to ubiquitous edge effect in proteomes across domains of life that can be explained by amino acid secondary structure propensities.
Environmental responses
Genetic code
Physiology
Structural biology
Journal
Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288
Informations de publication
Date de publication:
26 Oct 2024
26 Oct 2024
Historique:
received:
12
07
2024
accepted:
21
10
2024
medline:
27
10
2024
pubmed:
27
10
2024
entrez:
27
10
2024
Statut:
epublish
Résumé
Amino acids are the building blocks of proteins and enzymes which are essential for life. Understanding amino acid usage offers insights into protein function and molecular mechanisms underlying life histories. However, genome-wide patterns of amino acid usage across domains of life remain poorly understood. Here, we analysed the proteomes of 5590 species across four domains and found that only a few amino acids are consistently the most and least used. This differential usage results in lower amino acid usage diversity at the most and least frequent ranks, creating a ubiquitous inverted U-shape pattern of amino acid diversity and rank which we call an 'edge effect' across proteomes and domains of life. This effect likely stems from protein secondary structural constraints, not the evolutionary chronology of amino acid incorporation into the genetic code, highlighting the functional rather than evolutionary influences on amino acid usage. We also tested other contemporary hypotheses regarding amino acid usage in proteomes and found that amino acid usage varies across life's domains and is only weakly influenced by growth temperature. Our findings reveal a novel and pervasive amino acid usage pattern across genomes with the potential to help us probe deep evolutionary relationships and advance synthetic biology.
Identifiants
pubmed: 39462053
doi: 10.1038/s41598-024-77319-4
pii: 10.1038/s41598-024-77319-4
doi:
Substances chimiques
Proteome
0
Amino Acids
0
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
25544Subventions
Organisme : Biotechnology and Biological Sciences Research Council
ID : BB/V015249/1
Pays : United Kingdom
Informations de copyright
© 2024. The Author(s).
Références
Brack, A. From interstellar amino acids to prebiotic catalytic peptides: A review. Chem. Biodivers. 4, 665–679 (2007).
pubmed: 17443882
doi: 10.1002/cbdv.200790057
Van der Gulik, P. T. & Speijer, D. How amino acids and peptides shaped the RNA world. Life 5, 230–246 (2015).
pubmed: 25607813
pmcid: 4390850
doi: 10.3390/life5010230
Dufton, M. J. Genetic code synonym quotas and amino acid complexity: Cutting the cost of proteins?. J. Theor. Biol. 187, 165–173 (1997).
pubmed: 9237887
doi: 10.1006/jtbi.1997.0443
Tekaia, F. & Yeramian, E. Evolution of proteomes: Fundamental signatures and global trends in amino acid compositions. BMC Genom. 7, 307 (2006).
doi: 10.1186/1471-2164-7-307
Hickey, D. A. & Singer, G. A. Genomic and proteomic adaptations to growth at high temperature. Genome Biol. 5, 117 (2004).
pubmed: 15461805
pmcid: 545586
doi: 10.1186/gb-2004-5-10-117
Kimura, M. Evolutionary rate at the molecular level. Nature 217, 624–626 (1968).
pubmed: 5637732
doi: 10.1038/217624a0
King, J. L. & Jukes, T. H. Non-Darwinian Evolution: Most evolutionary change in proteins may be due to neutral mutations and genetic drift. Science 164, 788–798 (1969).
pubmed: 5767777
doi: 10.1126/science.164.3881.788
Akashi, H. & Gojobori, T. Metabolic efficiency and amino acid composition in the proteomes of Escherichia coli and Bacillus subtilis. Proc. Natl. Acad. Sci. 99, 3695–3700 (2002).
pubmed: 11904428
pmcid: 122586
doi: 10.1073/pnas.062526999
Ng, S. B. et al. Exome sequencing identifies the cause of a mendelian disorder. Nat. Genet. 42, 30–35 (2010).
pubmed: 19915526
doi: 10.1038/ng.499
Tarailo-Graovac, M. et al. Exome sequencing and the management of neurometabolic disorders. N. Engl. J. Med. 374, 2246–2255 (2016).
pubmed: 27276562
pmcid: 4983272
doi: 10.1056/NEJMoa1515792
Bloom, J. D., Labthavikul, S. T., Otey, C. R. & Arnold, F. H. Protein stability promotes evolvability. Proc. Natl. Acad. Sci. 103, 5869–5874 (2006).
pubmed: 16581913
pmcid: 1458665
doi: 10.1073/pnas.0510098103
Moore, E. J., Zorine, D., Hansen, W. A., Khare, S. D. & Fasan, R. Enzyme stabilization via computationally guided protein stapling. Proc. Natl. Acad. Sci. 114, 12472–12477 (2017).
pubmed: 29109284
pmcid: 5703291
doi: 10.1073/pnas.1708907114
Jimenez-Rosales, A. & Flores-Merino, M. V. Tailoring proteins to re-evolve nature: A short review. Mol. Biotechnol. 60, 946–974 (2018).
pubmed: 30264233
doi: 10.1007/s12033-018-0122-3
Swire, J. Selection on synthesis cost affects interprotein amino acid usage in all three domains of life. J. Mol. Evol. 64, 558–571 (2007).
pubmed: 17476453
doi: 10.1007/s00239-006-0206-8
Gómez Ortega, J., Raubenheimer, D., Tyagi, S., Mirth, C. K. & Piper, M. D. Biosynthetic constraints on amino acid synthesis at the base of the food chain may determine their use in higher-order consumer genomes. PLoS Genet. 19, e1010635 (2023).
pubmed: 36780875
pmcid: 9956874
doi: 10.1371/journal.pgen.1010635
Piper, M. D. et al. Matching dietary amino acid balance to the in silico-translated exome optimizes growth and reproduction without cost to lifespan. Cell Metab. 25, 610–621 (2017).
pubmed: 28273481
pmcid: 5355364
doi: 10.1016/j.cmet.2017.02.005
Ohta, T. Origin of the neutral and nearly neutral theories of evolution. J. Biosci. 28, 371–377 (2003).
pubmed: 12799485
doi: 10.1007/BF02705113
Drummond, D. A., Bloom, J. D., Adami, C., Wilke, C. O. & Arnold, F. H. Why highly expressed proteins evolve slowly. Proc. Natl. Acad. Sci. 102, 14338–14343 (2005).
pubmed: 16176987
pmcid: 1242296
doi: 10.1073/pnas.0504070102
Pál, C., Papp, B. & Hurst, L. D. Highly expressed genes in yeast evolve slowly. Genetics 158, 927–931 (2001).
pubmed: 11430355
pmcid: 1461684
doi: 10.1093/genetics/158.2.927
Krick, T. et al. Amino acid metabolism conflicts with protein diversity. Mol. Biol. Evol. 31, 2905–2912 (2014).
pubmed: 25086000
pmcid: 4209132
doi: 10.1093/molbev/msu228
Ren, W. et al. Amino acids as mediators of metabolic cross talk between host and pathogen. Front. Immunol. 9, 319 (2018).
pubmed: 29535717
pmcid: 5835074
doi: 10.3389/fimmu.2018.00319
Hauser, P. M. et al. Comparative genomics suggests that the fungal pathogen Pneumocystis is an obligate parasite scavenging amino acids from its host’s lungs. PLoS One 5, e15152 (2010).
pubmed: 21188143
pmcid: 3004796
doi: 10.1371/journal.pone.0015152
Chen, Y. & Nielsen, J. Yeast has evolved to minimize protein resource cost for synthesizing amino acids. Proc. Natl. Acad. Sci. 119, e2114622119 (2022).
pubmed: 35042799
pmcid: 8795554
doi: 10.1073/pnas.2114622119
Lehmann, J., Libchaber, A. & Greenbaum, B. D. Fundamental amino acid mass distributions and entropy costs in proteomes. J. Theor. Biol. 410, 119–124 (2016).
pubmed: 27544420
doi: 10.1016/j.jtbi.2016.08.011
Miseta, A. & Csutora, P. Relationship between the occurrence of cysteine in proteins and the complexity of organisms. Mol. Biol. Evol. 17, 1232–1239 (2000).
pubmed: 10908643
doi: 10.1093/oxfordjournals.molbev.a026406
Singer, G. A. & Hickey, D. A. Thermophilic prokaryotes have characteristic patterns of codon usage, amino acid composition and nucleotide content. Gene 317, 39–47 (2003).
pubmed: 14604790
doi: 10.1016/S0378-1119(03)00660-7
Friedman, R., Drake, J. W. & Hughes, A. L. Genome-wide patterns of nucleotide substitution reveal stringent functional constraints on the protein sequences of thermophiles. Genetics 167, 1507–1512 (2004).
pubmed: 15280258
pmcid: 1470942
doi: 10.1534/genetics.104.026344
DiGiacomo, J., McKay, C. & Davila, A. ThermoBase: A database of the phylogeny and physiology of thermophilic and hyperthermophilic organisms. Plos One 17, e0268253 (2022).
pubmed: 35536846
pmcid: 9089862
doi: 10.1371/journal.pone.0268253
Go, Y.-M., Chandler, J. D. & Jones, D. P. The cysteine proteome. Free Radic. Biol. Med. 84, 227–245 (2015).
pubmed: 25843657
pmcid: 4457640
doi: 10.1016/j.freeradbiomed.2015.03.022
Bragg, J. G., Thomas, D. & Baudouin-Cornu, P. Variation among species in proteomic sulphur content is related to environmental conditions. Proc. R. Soc. B Biol. Sci. 273, 1293–1300 (2006).
doi: 10.1098/rspb.2005.3441
Kumar, S., Tsai, C.-J. & Nussinov, R. Factors enhancing protein thermostability. Protein Eng. 13, 179–191 (2000).
pubmed: 10775659
doi: 10.1093/protein/13.3.179
Seligmann, H. Cost-minimization of amino acid usage. J. Mol. Evol. 56, 151–161 (2003).
pubmed: 12574861
doi: 10.1007/s00239-002-2388-z
Porensky, L. M. & Young, T. P. Edge-effect interactions in fragmented and patchy landscapes. Conserv. Biol. 27, 509–519 (2013).
pubmed: 23531018
doi: 10.1111/cobi.12042
Ries, L. & Sisk, T. D. A predictive model of edge effects. Ecology 85, 2917–2926 (2004).
doi: 10.1890/03-8021
Mizuguchi, K. & Blundell, T. L. Analysis of conservation and substitutions of secondary structure elements within protein superfamilies. Bioinformatics 16, 1111–1119 (2000).
pubmed: 11159330
doi: 10.1093/bioinformatics/16.12.1111
Gille, C., Goede, A., Preißner, R., Rother, K. & Frömmel, C. Conservation of substructures in proteins: Interfaces of secondary structural elements in proteasomal subunits. J. Mol. Biol. 299, 1147–1154 (2000).
pubmed: 10843865
doi: 10.1006/jmbi.2000.3763
Lüthy, R., McLachlan, A. D. & Eisenberg, D. Secondary structure-based profiles: Use of structure-conserving scoring tables in searching protein sequence databases for structural similarities. Proteins Struct. Funct. Bioinforma 10, 229–239 (1991).
doi: 10.1002/prot.340100307
Zvelebil, M. J., Barton, G. J., Taylor, W. R. & Sternberg, M. J. Prediction of protein secondary structure and active sites using the alignment of homologous sequences. J. Mol. Biol. 195, 957–961 (1987).
pubmed: 3656439
doi: 10.1016/0022-2836(87)90501-8
Chou, P. Y. & Fasman, G. D. Conformational parameters for amino acids in helical, β-sheet, and random coil regions calculated from proteins. Biochemistry 13, 211–222 (1974).
pubmed: 4358939
doi: 10.1021/bi00699a001
Fujiwara, K., Toda, H. & Ikeguchi, M. Dependence of alpha-helical and beta-sheet amino acid propensities on the overall protein fold type. BMC Struct. Biol. 12, 18 (2012).
pubmed: 22857400
pmcid: 3495713
doi: 10.1186/1472-6807-12-18
Burley, S. K. et al. Protein Data Bank (PDB): The single global macromolecular structure archive. Protein Crystallogr. Methods Protoc. 2017, 627–641 (2017).
Wang, G. & Dunbrack, R. L. Jr. PISCES: A protein sequence culling server. Bioinformatics 19, 1589–1591 (2003).
pubmed: 12912846
doi: 10.1093/bioinformatics/btg224
Trifonov, E. N. Consensus temporal order of amino acids and evolution of the triplet code. Gene 261, 139–151 (2000).
pubmed: 11164045
doi: 10.1016/S0378-1119(00)00476-5
Wehbi, S. et al. Order of amino acid recruitment into the genetic code resolved by Last Universal Common Ancestor’s protein domains. BioRxiv. https://doi.org/10.1101/2024.04.13.589375 (2024).
Trivedi, R. & Nagarajaram, H. A. Substitution scoring matrices for proteins—an overview. Protein Sci. 29, 2150–2163 (2020).
pubmed: 32954566
pmcid: 7586916
doi: 10.1002/pro.3954
Foo, J. L., Ching, C. B., Chang, M. W. & Leong, S. S. J. The imminent role of protein engineering in synthetic biology. Biotechnol. Adv. 30, 541–549 (2012).
pubmed: 21963685
doi: 10.1016/j.biotechadv.2011.09.008
Grünberg, R. & Serrano, L. Strategies for protein synthetic biology. Nucleic Acids Res. 38, 2663–2675 (2010).
pubmed: 20385577
pmcid: 2860127
doi: 10.1093/nar/gkq139
R Core Team, R. R: A language and environment for statistical computing (2013).
Brüne, D., Andrade-Navarro, M. A. & Mier, P. Proteome-wide comparison between the amino acid composition of domains and linkers. BMC Res. Notes 11, 117 (2018).
pubmed: 29426365
pmcid: 5807739
doi: 10.1186/s13104-018-3221-0
Grant, B. J., Skjærven, L. & Yao, X. The Bio3d packages for structural bioinformatics. Protein Sci. 30, 20–30 (2021).
pubmed: 32734663
doi: 10.1002/pro.3923
Wickham, H. ggplot2. WIREs Comput. Stat. 3, 180–185 (2011).
doi: 10.1002/wics.147
Kuznetsova, A., Brockhoff, P. B. & Christensen, R. H. B. Package ‘lmertest’. R Package Version 2 734 (2015).
Bates, D. et al. Package ‘lme4’. Httplme4 R-Forge R-Proj. Org (2009).
Morimoto, J., Conceição, P. & Smoczyk, K. Nutrigonometry III: Curvature, area and differences between performance landscapes. R. Soc. Open Sci. 9, 221326 (2022).
pubmed: 36465681
pmcid: 9709515
doi: 10.1098/rsos.221326
Team, R. C., Team, M. R. C., Suggests, M. & Matrix, S. Package stats. R Stats Package (2018).
Barton, K. & Barton, M. K. Package ‘mumin’. Version 1 439 (2015).
Frerebeau, N. tabula: An R package for analysis, seriation, and visualization of archaeological count data. J. Open Source Softw. 4, 1821 (2019).
doi: 10.21105/joss.01821
Shannon, C. E. A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423 (1948).
doi: 10.1002/j.1538-7305.1948.tb01338.x