Exploring short k-mer profiles in cells and mobile elements from Archaea highlights the major influence of both the ecological niche and evolutionary history.

5-mer Codon composition Extrachromosomal element Halophily Host transfer Hyperthermophily Multivariate analysis Plasmid Signature Virus

Journal

BMC genomics
ISSN: 1471-2164
Titre abrégé: BMC Genomics
Pays: England
ID NLM: 100965258

Informations de publication

Date de publication:
16 Mar 2021
Historique:
received: 13 10 2020
accepted: 24 02 2021
entrez: 17 3 2021
pubmed: 18 3 2021
medline: 20 5 2021
Statut: epublish

Résumé

K-mer-based methods have greatly advanced in recent years, largely driven by the realization of their biological significance and by the advent of next-generation sequencing. Their speed and their independence from the annotation process are major advantages. Their utility in the study of the mobilome has recently emerged and they seem a priori adapted to the patchy gene distribution and the lack of universal marker genes of viruses and plasmids. To provide a framework for the interpretation of results from k-mer based methods applied to archaea or their mobilome, we analyzed the 5-mer DNA profiles of close to 600 archaeal cells, viruses and plasmids. Archaea is one of the three domains of life. Archaea seem enriched in extremophiles and are associated with a high diversity of viral and plasmid families, many of which are specific to this domain. We explored the dataset structure by multivariate and statistical analyses, seeking to identify the underlying factors. For cells, the 5-mer profiles were inconsistent with the phylogeny of archaea. At a finer taxonomic level, the influence of the taxonomy and the environmental constraints on 5-mer profiles was very strong. These two factors were interdependent to a significant extent, and the respective weights of their contributions varied according to the clade. A convergent adaptation was observed for the class Halobacteria, for which a strong 5-mer signature was identified. For mobile elements, coevolution with the host had a clear influence on their 5-mer profile. This enabled us to identify one previously known and one new case of recent host transfer based on the atypical composition of the mobile elements involved. Beyond the effect of coevolution, extrachromosomal elements strikingly retain the specific imprint of their own viral or plasmid taxonomic family in their 5-mer profile. This specific imprint confirms that the evolution of extrachromosomal elements is driven by multiple parameters and is not restricted to host adaptation. In addition, we detected only recent host transfer events, suggesting the fast evolution of short k-mer profiles. This calls for caution when using k-mers for host prediction, metagenomic binning or phylogenetic reconstruction.

Sections du résumé

BACKGROUND BACKGROUND
K-mer-based methods have greatly advanced in recent years, largely driven by the realization of their biological significance and by the advent of next-generation sequencing. Their speed and their independence from the annotation process are major advantages. Their utility in the study of the mobilome has recently emerged and they seem a priori adapted to the patchy gene distribution and the lack of universal marker genes of viruses and plasmids. To provide a framework for the interpretation of results from k-mer based methods applied to archaea or their mobilome, we analyzed the 5-mer DNA profiles of close to 600 archaeal cells, viruses and plasmids. Archaea is one of the three domains of life. Archaea seem enriched in extremophiles and are associated with a high diversity of viral and plasmid families, many of which are specific to this domain. We explored the dataset structure by multivariate and statistical analyses, seeking to identify the underlying factors.
RESULTS RESULTS
For cells, the 5-mer profiles were inconsistent with the phylogeny of archaea. At a finer taxonomic level, the influence of the taxonomy and the environmental constraints on 5-mer profiles was very strong. These two factors were interdependent to a significant extent, and the respective weights of their contributions varied according to the clade. A convergent adaptation was observed for the class Halobacteria, for which a strong 5-mer signature was identified. For mobile elements, coevolution with the host had a clear influence on their 5-mer profile. This enabled us to identify one previously known and one new case of recent host transfer based on the atypical composition of the mobile elements involved. Beyond the effect of coevolution, extrachromosomal elements strikingly retain the specific imprint of their own viral or plasmid taxonomic family in their 5-mer profile.
CONCLUSION CONCLUSIONS
This specific imprint confirms that the evolution of extrachromosomal elements is driven by multiple parameters and is not restricted to host adaptation. In addition, we detected only recent host transfer events, suggesting the fast evolution of short k-mer profiles. This calls for caution when using k-mers for host prediction, metagenomic binning or phylogenetic reconstruction.

Identifiants

pubmed: 33726663
doi: 10.1186/s12864-021-07471-y
pii: 10.1186/s12864-021-07471-y
pmc: PMC7962313
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

186

Subventions

Organisme : Agence Nationale de la Recherche
ID : ANR-17-CE05-0011-01
Organisme : European Research Council
ID : 340440
Pays : International

Références

Nucleic Acids Res. 2016 Oct 14;44(18):8799-8809
pubmed: 27407114
Nucleic Acids Res. 2018 Apr 6;46(6):e35
pubmed: 29346586
Extremophiles. 2007 Jan;11(1):9-18
pubmed: 16957882
BMC Bioinformatics. 2016 Jan 16;17:38
pubmed: 26774270
Proc Natl Acad Sci U S A. 2009 Nov 10;106(45):19126-31
pubmed: 19855009
Bioinformatics. 2017 Oct 1;33(19):3113-3114
pubmed: 28957499
F1000Res. 2016 Nov 29;5:2789
pubmed: 28105314
J Mol Biol. 2003 Mar 21;327(2):347-57
pubmed: 12628242
Amino Acids. 2016 Mar;48(3):751-762
pubmed: 26520112
mSystems. 2018 Nov 20;3(6):
pubmed: 30505941
Genome Res. 2003 Feb;13(2):145-58
pubmed: 12566393
Environ Microbiol. 2016 Mar;18(3):889-903
pubmed: 26472517
PLoS Comput Biol. 2008 Apr 18;4(4):e1000057
pubmed: 18421372
Genome Biol. 2017 Oct 3;18(1):186
pubmed: 28974235
Mol Biol Evol. 2012 Apr;29(4):1225-40
pubmed: 22130968
BMC Genomics. 2006 Feb 15;7:26
pubmed: 16480495
Synth Syst Biotechnol. 2019 Aug 31;4(3):150-156
pubmed: 31508512
Genome Biol. 2008 Apr 09;9(4):R70
pubmed: 18397532
Gene. 2014 Aug 1;546(1):25-34
pubmed: 24858075
FEMS Microbiol Rev. 2011 Jul;35(4):577-608
pubmed: 21265868
FEMS Microbiol Rev. 2016 Mar;40(2):258-72
pubmed: 26657537
BMC Genomics. 2012 Jun 12;13:236
pubmed: 22691113
Virus Res. 2018 Jan 15;244:181-193
pubmed: 29175107
BMC Genomics. 2008 Feb 28;9:104
pubmed: 18307761
Nat Methods. 2014 Nov;11(11):1144-6
pubmed: 25218180
Life (Basel). 2014 Nov 13;4(4):681-715
pubmed: 25402735
Int J Biol Macromol. 2007 Oct 1;41(4):447-53
pubmed: 17675150
Microb Genom. 2020 Oct;6(10):
pubmed: 33001022
Proc Natl Acad Sci U S A. 2010 Jan 5;107(1):127-32
pubmed: 20007769
Virology. 2008 May 25;375(1):292-300
pubmed: 18308362
Environ Microbiol. 2019 Dec;21(12):4685-4705
pubmed: 31503394
Front Microbiol. 2014 Mar 12;5:84
pubmed: 24659986
Arch Virol. 2001;146(5):843-57
pubmed: 11448025
Trends Genet. 2002 Jun;18(6):291-4
pubmed: 12044357
Trends Genet. 1995 Jul;11(7):283-90
pubmed: 7482779
Environ Microbiol. 2004 Sep;6(9):938-47
pubmed: 15305919
Genome Biol. 2014 Mar 03;15(3):R46
pubmed: 24580807
Microbiol Mol Biol Rev. 2015 Mar;79(1):117-52
pubmed: 25694123
Genome Res. 2003 Nov;13(11):2498-504
pubmed: 14597658
PLoS Comput Biol. 2007 Jan 12;3(1):e5
pubmed: 17222055
Archaea. 2004 Oct;1(4):231-9
pubmed: 15810432
Bioinformatics. 2007 Jan 1;23(1):127-8
pubmed: 17050570
J Mol Biol. 2010 Mar 19;397(1):144-60
pubmed: 20109464
Proc Natl Acad Sci U S A. 1999 Aug 3;96(16):9184-9
pubmed: 10430917
Microbiome. 2017 Jul 6;5(1):69
pubmed: 28683828
J Virol. 2016 Nov 28;90(24):11043-11055
pubmed: 27681128
Gene. 2002 Sep 4;297(1-2):51-60
pubmed: 12384285
Nat Microbiol. 2017 Oct;2(10):1446-1455
pubmed: 28827601
Sci Rep. 2016 Jul 01;6:28970
pubmed: 27363362
Curr Biol. 2017 May 8;27(9):1362-1368
pubmed: 28457865
Nucleic Acids Res. 2001 Apr 1;29(7):1608-15
pubmed: 11266564
Genome Biol. 2011 Oct 27;12(10):R109
pubmed: 22032172
Bioinformatics. 2020 Mar 1;36(5):1629-1631
pubmed: 31589313
J Bacteriol. 1997 Jun;179(12):3899-913
pubmed: 9190805
Genome Res. 2001 Oct;11(10):1641-50
pubmed: 11591641
Genome Res. 1998 Nov;8(11):1131-41
pubmed: 9847077
Environ Microbiol. 2009 Feb;11(2):457-66
pubmed: 19196276
Adv Microb Physiol. 2009;55:1-79, 317
pubmed: 19573695
Mol Biol Evol. 2017 Oct 1;34(10):2716-2729
pubmed: 28957508
Nature. 2008 Dec 18;456(7224):942-5
pubmed: 19037246
Extremophiles. 2003 Dec;7(6):443-50
pubmed: 14666404
Mol Biol Evol. 2011 Sep;28(9):2661-74
pubmed: 21498602
Nat Microbiol. 2017 Oct;2(10):1340-1341
pubmed: 29046528
Gene. 2003 Oct 23;317(1-2):39-47
pubmed: 14604790
Nucleic Acids Res. 2019 Jan 8;47(D1):D631-D636
pubmed: 30256983
BMC Evol Biol. 2013 Jul 11;13:146
pubmed: 23841456
J Mol Biol. 1990 Oct 5;215(3):403-10
pubmed: 2231712
PeerJ. 2015 Aug 27;3:e1165
pubmed: 26336640
Bioinformatics. 2011 Apr 1;27(7):1009-10
pubmed: 21278367
Nucleic Acids Res. 2005 Jan 13;33(1):e6
pubmed: 15653627
BMC Genomics. 2006 Jul 04;7:169
pubmed: 16820047

Auteurs

Ariane Bize (A)

Université Paris-Saclay, INRAE, PROSE, F-92761, Antony, France. ariane.bize@inrae.fr.

Cédric Midoux (C)

Université Paris-Saclay, INRAE, PROSE, F-92761, Antony, France.
Université Paris-Saclay, INRAE, MaIAGE, F-78350, Jouy-en-Josas, France.
Université Paris-Saclay, INRAE, BioinfOmics, MIGALE bioinformatics facility, F-78350, Jouy-en-Josas, France.

Mahendra Mariadassou (M)

Université Paris-Saclay, INRAE, MaIAGE, F-78350, Jouy-en-Josas, France.
Université Paris-Saclay, INRAE, BioinfOmics, MIGALE bioinformatics facility, F-78350, Jouy-en-Josas, France.

Sophie Schbath (S)

Université Paris-Saclay, INRAE, MaIAGE, F-78350, Jouy-en-Josas, France.
Université Paris-Saclay, INRAE, BioinfOmics, MIGALE bioinformatics facility, F-78350, Jouy-en-Josas, France.

Patrick Forterre (P)

Institut Pasteur, Unité de Virologie des Archées, Département de Microbiologie, 25 Rue du Docteur Roux, 75015, Paris, France. patrick.forterre@pasteur.fr.
Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France. patrick.forterre@pasteur.fr.

Violette Da Cunha (V)

Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing
Animals Hemiptera Insect Proteins Phylogeny Insecticides
Amaryllidaceae Alkaloids Lycoris NADPH-Ferrihemoprotein Reductase Gene Expression Regulation, Plant Plant Proteins
Drought Resistance Gene Expression Profiling Gene Expression Regulation, Plant Gossypium Multigene Family

Classifications MeSH