Environment and taxonomy shape the genomic signature of prokaryotic extremophiles.


Journal

Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288

Informations de publication

Date de publication:
26 09 2023
Historique:
received: 06 06 2023
accepted: 11 09 2023
medline: 4 10 2023
pubmed: 27 9 2023
entrez: 26 9 2023
Statut: epublish

Résumé

This study provides comprehensive quantitative evidence suggesting that adaptations to extreme temperatures and pH imprint a discernible environmental component in the genomic signature of microbial extremophiles. Both supervised and unsupervised machine learning algorithms were used to analyze genomic signatures, each computed as the k-mer frequency vector of a 500 kbp DNA fragment arbitrarily selected to represent a genome. Computational experiments classified/clustered genomic signatures extracted from a curated dataset of [Formula: see text] extremophile (temperature, pH) bacteria and archaea genomes, at multiple scales of analysis, [Formula: see text]. The supervised learning resulted in high accuracies for taxonomic classifications at [Formula: see text], and medium to medium-high accuracies for environment category classifications of the same datasets at [Formula: see text]. For [Formula: see text], our findings were largely consistent with amino acid compositional biases and codon usage patterns in coding regions, previously attributed to extreme environment adaptations. The unsupervised learning of unlabelled sequences identified several exemplars of hyperthermophilic organisms with large similarities in their genomic signatures, in spite of belonging to different domains in the Tree of Life.

Identifiants

pubmed: 37752120
doi: 10.1038/s41598-023-42518-y
pii: 10.1038/s41598-023-42518-y
pmc: PMC10522608
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

16105

Informations de copyright

© 2023. Springer Nature Limited.

Références

Astrobiology. 2022 Nov;22(11):1337-1350
pubmed: 36282180
mSystems. 2020 Jan 7;5(1):
pubmed: 31911466
Genetics. 2004 Jul;167(3):1507-12
pubmed: 15280258
Biology (Basel). 2023 Feb 16;12(2):
pubmed: 36829597
BMC Bioinformatics. 2008 Jul 11;9:307
pubmed: 18620558
PLoS Comput Biol. 2007 Jan 12;3(1):e5
pubmed: 17222055
Nucleic Acids Res. 2023 Jan 6;51(D1):D957-D963
pubmed: 36318257
J Bacteriol. 1997 Jun;179(12):3899-913
pubmed: 9190805
Nat Biotechnol. 2021 May;39(5):555-560
pubmed: 33398153
Structure. 2000 May 15;8(5):493-504
pubmed: 10801491
Genome Biol. 2017 Oct 3;18(1):186
pubmed: 28974235
Physiology (Bethesda). 2015 Mar;30(2):97-106
pubmed: 25729055
Archaea. 2008 Dec;2(3):159-67
pubmed: 19054742
BMC Bioinformatics. 2009 Oct 02;10:316
pubmed: 19799776
Front Microbiol. 2019 Apr 15;10:780
pubmed: 31037068
Appl Environ Microbiol. 1986 Jun;51(6):1180-5
pubmed: 16347075
Nature. 2001 Feb 22;409(6823):1092-101
pubmed: 11234023
Genome Res. 2003 Jul;13(7):1580-8
pubmed: 12805271
Gene. 2006 Dec 30;385:128-36
pubmed: 16989961
Microb Cell Fact. 2007 Mar 15;6:9
pubmed: 17359551
Genome Res. 2000 Feb;10(2):228-36
pubmed: 10673280
BMC Bioinformatics. 2016 Aug 22;17(1):313
pubmed: 27549194
J Biol Chem. 2008 Aug 22;283(34):23315-25
pubmed: 18539590
BMC Genomics. 2014 Dec 16;15:1120
pubmed: 25515036
Appl Environ Microbiol. 1998 Oct;64(10):3576-83
pubmed: 9758770
Microbiol Mol Biol Rev. 1999 Dec;63(4):735-50, table of contents
pubmed: 10585964
Genome Res. 2002 May;12(5):689-700
pubmed: 11997336
Genetica. 2021 Apr;149(2):81-88
pubmed: 33880685
Trends Genet. 1995 Jul;11(7):283-90
pubmed: 7482779
Bioinformatics. 2023 Sep 2;39(9):
pubmed: 37589603
PLoS One. 2020 Apr 24;15(4):e0232391
pubmed: 32330208
Sci Rep. 2016 Jun 29;6:28840
pubmed: 27354155
Protein Eng. 2000 Mar;13(3):179-91
pubmed: 10775659
FEMS Microbiol Ecol. 2018 Apr 1;94(4):
pubmed: 29528411
Front Microbiol. 2015 Nov 05;6:1209
pubmed: 26594201
Protein Sci. 2007 Sep;16(9):1887-95
pubmed: 17766385
Microorganisms. 2019 Jun 03;7(6):
pubmed: 31163634
NAR Genom Bioinform. 2020 Feb 19;2(1):lqaa009
pubmed: 33575556
BMC Genomics. 2008 May 06;9:210
pubmed: 18460197
Microbiome. 2020 Oct 29;8(1):150
pubmed: 33121542
Genome Biol. 2002 Jul 19;3(8):PREPRINT0006
pubmed: 12186639
Plant Commun. 2020 Oct 29;1(6):100117
pubmed: 33367270
Sci Rep. 2020 Feb 4;10(1):1822
pubmed: 32020026
BMC Bioinformatics. 2018 Jul 9;19(Suppl 7):198
pubmed: 30066629
PLoS One. 2018 Nov 14;13(11):e0206409
pubmed: 30427878
Gene. 2003 Oct 23;317(1-2):39-47
pubmed: 14604790
Front Bioeng Biotechnol. 2020 Jun 12;8:483
pubmed: 32596215
PLoS One. 2022 Jan 21;17(1):e0261531
pubmed: 35061715
BMC Genomics. 2019 Apr 3;20(1):267
pubmed: 30943897
Proteins. 2020 Jun;88(6):788-808
pubmed: 31872464
Sci Rep. 2018 Oct 19;8(1):15548
pubmed: 30341344
Mol Biol Evol. 1999 Oct;16(10):1391-9
pubmed: 10563018

Auteurs

Pablo Millán Arias (PM)

School of Computer Science, University of Waterloo, Waterloo, ON, Canada. pmillana@uwaterloo.ca.

Joseph Butler (J)

Department of Biology, University of Western Ontario, London, ON, Canada.

Gurjit S Randhawa (GS)

School of Mathematical and Computational Sciences, University of Prince Edward Island, Charlottetown, PE, Canada.

Maximillian P M Soltysiak (MPM)

Department of Biology, University of Western Ontario, London, ON, Canada.

Kathleen A Hill (KA)

Department of Biology, University of Western Ontario, London, ON, Canada.

Lila Kari (L)

School of Computer Science, University of Waterloo, Waterloo, ON, Canada.

Articles similaires

Populus Soil Microbiology Soil Microbiota Fungi
Aerosols Humans Decontamination Air Microbiology Masks
Coal Metagenome Phylogeny Bacteria Genome, Bacterial
Semiconductors Photosynthesis Polymers Carbon Dioxide Bacteria

Classifications MeSH