Precise phylogenetic analysis of microbial isolates and genomes from metagenomes using PhyloPhlAn 3.0.


Journal

Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555

Informations de publication

Date de publication:
19 05 2020
Historique:
received: 17 08 2019
accepted: 27 04 2020
entrez: 20 5 2020
pubmed: 20 5 2020
medline: 25 8 2020
Statut: epublish

Résumé

Microbial genomes are available at an ever-increasing pace, as cultivation and sequencing become cheaper and obtaining metagenome-assembled genomes (MAGs) becomes more effective. Phylogenetic placement methods to contextualize hundreds of thousands of genomes must thus be efficiently scalable and sensitive from closely related strains to divergent phyla. We present PhyloPhlAn 3.0, an accurate, rapid, and easy-to-use method for large-scale microbial genome characterization and phylogenetic analysis at multiple levels of resolution. PhyloPhlAn 3.0 can assign genomes from isolate sequencing or MAGs to species-level genome bins built from >230,000 publically available sequences. For individual clades of interest, it reconstructs strain-level phylogenies from among the closest species using clade-specific maximally informative markers. At the other extreme of resolution, it scales to large phylogenies comprising >17,000 microbial species. Examples including Staphylococcus aureus isolates, gut metagenomes, and meta-analyses demonstrate the ability of PhyloPhlAn 3.0 to support genomic and metagenomic analyses.

Identifiants

pubmed: 32427907
doi: 10.1038/s41467-020-16366-7
pii: 10.1038/s41467-020-16366-7
pmc: PMC7237447
doi:

Types de publication

Evaluation Study Journal Article Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

2500

Subventions

Organisme : NCI NIH HHS
ID : R01 CA230551
Pays : United States
Organisme : NCI NIH HHS
ID : U01 CA230551
Pays : United States

Références

Segata, N., Börnigen, D., Morgan, X. C. & Huttenhower, C. PhyloPhlAn is a new method for improved phylogenetic and taxonomic placement of microbes. Nat. Commun. 4, 2304 (2013).
pubmed: 23942190 pmcid: 3760377 doi: 10.1038/ncomms3304
Darling, A. E. et al. PhyloSift: phylogenetic analysis of genomes and metagenomes. PeerJ 2, e243 (2014).
pubmed: 24482762 pmcid: 3897386 doi: 10.7717/peerj.243
Wu, Y.-W. ezTree: an automated pipeline for identifying phylogenetic marker genes and inferring evolutionary relationships among uncultivated prokaryotic draft genomes. BMC Genomics 19, 921 (2018).
pubmed: 29363425 pmcid: 5780852 doi: 10.1186/s12864-017-4327-9
Lee, M. D. GToTree: a user-friendly workflow for phylogenomics. Bioinformatics https://doi.org/10.1093/bioinformatics/btz188 (2019).
Wu, M. & Eisen, J. A. A simple, fast, and accurate method of phylogenomic inference. Genome Biol. 9, R151 (2008).
pubmed: 18851752 pmcid: 2760878 doi: 10.1186/gb-2008-9-10-r151
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).
pubmed: 24451623 pmcid: 24451623 doi: 10.1093/bioinformatics/btu033
Marçais, G. et al. MUMmer4: a fast and versatile genome alignment system. PLoS Comput. Biol. 14, e1005944 (2018).
pubmed: 29373581 pmcid: 5802927 doi: 10.1371/journal.pcbi.1005944
Matsen, F. A., Kodner, R. B. & Armbrust, E. V. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinform. 11, 538 (2010).
doi: 10.1186/1471-2105-11-538
Mirarab, S., Nguyen, N. & Warnow, T. SEPP: SATé-enabled phylogenetic placement. Pac. Symp. Biocomput. 17, 247–258 (2012).
Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).
pubmed: 15034147 pmcid: 15034147 doi: 10.1093/nar/gkh340
Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).
pubmed: 3603318 pmcid: 3603318 doi: 10.1093/molbev/mst010
Notredame, C., Higgins, D. G. & Heringa, J. T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302, 205–217 (2000).
pubmed: 10964570 doi: 10.1006/jmbi.2000.4042 pmcid: 10964570
Wheeler, T. J. & Kececioglu, J. D. Multiple alignment by aligning alignments. Bioinformatics 23, i559–i568 (2007).
pubmed: 17646343 doi: 10.1093/bioinformatics/btm226 pmcid: 17646343
Mirarab, S. et al. PASTA: ultra-large multiple sequence alignment for nucleotide and amino-acid sequences. J. Comput. Biol. 22, 377–386 (2015).
pubmed: 25549288 pmcid: 4424971 doi: 10.1089/cmb.2014.0156
Nguyen, N.-P., Mirarab, S., Kumar, K. & Warnow, T. Ultra-large alignments using ensembles of hidden markov models. (ed. Przytycka, T. M.) in Research in Computational Molecular Biology 259–260 (Springer International Publishing, 2015).
Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2-approximately maximum-likelihood trees for large alignments. PLoS ONE 5, e9490 (2010).
pubmed: 20224823 pmcid: 20224823 doi: 10.1371/journal.pone.0009490
Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol. Biol. Evol. 26, 1641–1650 (2009).
pubmed: 19377059 pmcid: 19377059 doi: 10.1093/molbev/msp077
Mirarab, S. et al. ASTRAL: genome-scale coalescent-based species tree estimation. Bioinformatics 30, i541–i548 (2014).
pubmed: 25161245 pmcid: 4147915 doi: 10.1093/bioinformatics/btu462
Mirarab, S. & Warnow, T. ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics 31, i44–i52 (2015).
pubmed: 26072508 pmcid: 4765870 doi: 10.1093/bioinformatics/btv234
Zhang, C., Sayyari, E. & Mirarab, S. ASTRAL-III: increased scalability and impacts of contracting low support branches. in Comparative Genomics 53–75 (Springer International Publishing, 2017).
Vachaspati, P. & Warnow, T. ASTRID: accurate species trees from internode distances. BMC Genomics 16(Suppl 10), S3 (2015).
pubmed: 26449326 pmcid: 4602181 doi: 10.1186/1471-2164-16-S10-S3
Nguyen, L.-T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).
pubmed: 25371430 doi: 10.1093/molbev/msu300 pmcid: 25371430
Maiden, M. C. et al. Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc. Natl Acad. Sci. USA 95, 3140–3145 (1998).
pubmed: 9501229 doi: 10.1073/pnas.95.6.3140 pmcid: 9501229
Page, A. J. et al. Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics 31, 3691–3693 (2015).
pubmed: 26198102 pmcid: 4817141 doi: 10.1093/bioinformatics/btv421
Parks, D. H. et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat. Biotechnol. 36, 996–1004 (2018).
pubmed: 30148503 pmcid: 30148503 doi: 10.1038/nbt.4229
Zhu, Q. et al. Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains Bacteria and Archaea. Nat. Commun. 10, 5477 (2019).
pubmed: 31792218 pmcid: 31792218 doi: 10.1038/s41467-019-13443-4
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
doi: 10.1016/S0022-2836(05)80360-2
Edgar, R. C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460–2461 (2010).
pubmed: 20709691 pmcid: 20709691 doi: 10.1093/bioinformatics/btq461
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).
pubmed: 25402007 pmcid: 25402007 doi: 10.1038/nmeth.3176
Nguyen, N.-P. D., Mirarab, S., Kumar, K. & Warnow, T. Ultra-large alignments using phylogeny-aware profiles. Genome Biol. 16, 124 (2015).
pubmed: 26076734 pmcid: 4492008 doi: 10.1186/s13059-015-0688-z
Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009).
pubmed: 19505945 pmcid: 2712344 doi: 10.1093/bioinformatics/btp348
Chewapreecha, C. et al. Dense genomic sampling identifies highways of pneumococcal recombination. Nat. Genet. 46, 305–309 (2014).
pubmed: 24509479 pmcid: 3970364 doi: 10.1038/ng.2895
Bratcher, H. B., Corton, C., Jolley, K. A., Parkhill, J. & Maiden, M. C. J. A gene-by-gene population genomics platform: de novo assembly, annotation and genealogical analysis of 108 representative Neisseria meningitidis genomes. BMC Genomics 15, 1138 (2014).
pubmed: 25523208 pmcid: 4377854 doi: 10.1186/1471-2164-15-1138
Walker, T. M. et al. Whole-genome sequencing for prediction of Mycobacterium tuberculosis drug susceptibility and resistance: a retrospective cohort study. Lancet Infect. Dis. 15, 1193–1202 (2015).
pubmed: 26116186 pmcid: 4579482 doi: 10.1016/S1473-3099(15)00062-6
Suzek, B. E., Huang, H., McGarvey, P., Mazumder, R. & Wu, C. H. UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23, 1282–1288 (2007).
pubmed: 17379688 doi: 10.1093/bioinformatics/btm098 pmcid: 17379688
Manara, S. et al. Whole-genome epidemiology, characterisation, and phylogenetic reconstruction of Staphylococcus aureus strains in a paediatric hospital. Genome Med. 10, 82 (2018).
pubmed: 30424799 pmcid: 6234625 doi: 10.1186/s13073-018-0593-7
Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068–2069 (2014).
pubmed: 24642063 doi: 10.1093/bioinformatics/btu153 pmcid: 24642063
NCBI Resource Coordinators. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 42, D7–D17 (2014).
doi: 10.1093/nar/gkt1146
Asnicar, F., Weingart, G., Tickle, T. L., Huttenhower, C. & Segata, N. Compact graphical representation of phylogenetic data and metadata with GraPhlAn. PeerJ 3, e1029 (2015).
pubmed: 26157614 pmcid: 4476132 doi: 10.7717/peerj.1029
Thomas, A. M. & Segata, N. Multiple levels of the unknown in microbiome research. BMC Biol. 17, 48 (2019).
pubmed: 31189463 pmcid: 6560723 doi: 10.1186/s12915-019-0667-z
Pasolli, E. et al. Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle. Cell 176, 649–662.e20 (2019).
pubmed: 30661755 pmcid: 6349461 doi: 10.1016/j.cell.2019.01.001
Ondov, B. D. et al. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol. 17, 132 (2016).
pubmed: 27323842 pmcid: 27323842 doi: 10.1186/s13059-016-0997-x
Bowers, R. M. et al. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat. Biotechnol. 35, 725–731 (2017).
pubmed: 28787424 pmcid: 28787424 doi: 10.1038/nbt.3893
Jain, C., Rodriguez-R, L. M., Phillippy, A. M., Konstantinidis, K. T. & Aluru, S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat. Commun. 9, 5114 (2018).
pubmed: 30504855 pmcid: 30504855 doi: 10.1038/s41467-018-07641-9
Zou, Y. et al. 1,520 reference genomes from cultivated human gut bacteria enable functional microbiome analyses. Nat. Biotechnol. 37, 179–185 (2019).
pubmed: 30718868 pmcid: 6784896 doi: 10.1038/s41587-018-0008-8
Tett, A. et al. The Prevotella copri complex comprises four distinct clades underrepresented in westernized populations. Cell Host Microbe 26, 666–679.e7 (2019).
pubmed: 31607556 pmcid: 6854460 doi: 10.1016/j.chom.2019.08.018
Zolfo, M., Tett, A., Jousson, O., Donati, C. & Segata, N. MetaMLST: multi-locus strain-level bacterial typing from metagenomic samples. Nucleic Acids Res. 45, e7 (2017).
pubmed: 27651451 doi: 10.1093/nar/gkw837 pmcid: 27651451
Obregon-Tito, A. J. et al. Subsistence strategies in traditional societies distinguish gut microbiomes. Nat. Commun. 6, 6505 (2015).
pubmed: 25807110 pmcid: 4386023 doi: 10.1038/ncomms7505
Rampelli, S. et al. Metagenome sequencing of the Hadza Hunter-Gatherer gut microbiota. Curr. Biol. 25, 1682–1693 (2015).
pubmed: 25981789 doi: 10.1016/j.cub.2015.04.055 pmcid: 25981789
David, L. A. et al. Gut microbial succession follows acute secretory diarrhea in humans. MBio 6, e00381–15 (2015).
pubmed: 25991682 pmcid: 4442136
UniProt Consortium. Activities at the universal protein resource. Nucleic Acids Res. 42, D191–D198 (2014).
doi: 10.1093/nar/gku469
Parks, D. H. et al. Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nat. Microbiol. 2, 1533–1542 (2017).
doi: 10.1038/s41564-017-0012-7
Valdar, W. S. J. Scoring residue conservation. Proteins 48, 227–241 (2002).
pubmed: 12112692 doi: 10.1002/prot.10146 pmcid: 12112692
Hug, L. A. et al. A new view of the tree of life. Nat. Microbiol. 1, 16048 (2016).
pubmed: 27572647 pmcid: 27572647 doi: 10.1038/nmicrobiol.2016.48
Brown, C. T. et al. Unusual biology across a group comprising more than 15% of domain Bacteria. Nature 523, 208–211 (2015).
pubmed: 26083755 doi: 10.1038/nature14486 pmcid: 26083755
Ciccarelli, F. D. et al. Toward automatic reconstruction of a highly resolved tree of life. Science 311, 1283–1287 (2006).
pubmed: 16513982 doi: 10.1126/science.1123061 pmcid: 16513982
Salichos, L., Stamatakis, A. & Rokas, A. Novel information theory-based measures for quantifying incongruence among phylogenetic trees. Mol. Biol. Evol. 31, 1261–1271 (2014).
pubmed: 24509691 doi: 10.1093/molbev/msu061 pmcid: 24509691
Bursteinas, B. et al. Minimizing proteome redundancy in the UniProt Knowledgebase. Database 2016, 1–9 (2016).
Castresana, J. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol. Biol. Evol. 17, 540–552 (2000).
pubmed: 10742046 doi: 10.1093/oxfordjournals.molbev.a026334 pmcid: 10742046
Dress, A. W. M. et al. Noisy: identification of problematic columns in multiple sequence alignments. Algorithms Mol. Biol. 3, 7 (2008).
pubmed: 18577231 pmcid: 2464588 doi: 10.1186/1748-7188-3-7
Sela, I., Ashkenazy, H., Katoh, K. & Pupko, T. GUIDANCE2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters. Nucleic Acids Res. 43, W7–W14 (2015).
pubmed: 25883146 pmcid: 4489236 doi: 10.1093/nar/gkv318
Webb, A. E., Walsh, T. A. & O’Connell, M. J. VESPA: very large-scale evolutionary and selective pressure analyses. PeerJ Comput. Sci. 3, e118 (2017).
doi: 10.7717/peerj-cs.118
Tan, G. et al. Current methods for automated filtering of multiple sequence alignments frequently worsen single-gene phylogenetic inference. Syst. Biol. 64, 778–791 (2015).
pubmed: 26031838 pmcid: 4538881 doi: 10.1093/sysbio/syv033
Chang, J.-M., Di Tommaso, P. & Notredame, C. TCS: a new multiple sequence alignment reliability measure to estimate alignment accuracy and improve phylogenetic tree reconstruction. Mol. Biol. Evol. 31, 1625–1637 (2014).
pubmed: 24694831 doi: 10.1093/molbev/msu117 pmcid: 24694831
Edgar, R. C. Optimizing substitution matrix choice and gap parameters for sequence alignment. BMC Bioinform. 10, 396 (2009).
doi: 10.1186/1471-2105-10-396
Penn, O., Privman, E., Landan, G., Graur, D. & Pupko, T. An alignment confidence score capturing robustness to guide tree uncertainty. Mol. Biol. Evol. 27, 1759–1767 (2010).
pubmed: 20207713 pmcid: 2908709 doi: 10.1093/molbev/msq066
Talavera, G. & Castresana, J. Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst. Biol. 56, 564–577 (2007).
pubmed: 17654362 doi: 10.1080/10635150701472164 pmcid: 17654362
Yamada, K. & Tomii, K. Revisiting amino acid substitution matrices for identifying distantly related proteins. Bioinformatics 30, 317–325 (2014).
pubmed: 24281694 doi: 10.1093/bioinformatics/btt694 pmcid: 24281694
Keul, F., Hess, M., Goesele, M. & Hamacher, K. PFASUM: a substitution matrix from Pfam structural alignments. BMC Bioinform. 18, 293 (2017).
doi: 10.1186/s12859-017-1703-z
Mai, U. & Mirarab, S. TreeShrink: efficient detection of outlier tree leaves. in Comparative Genomics 116–140 (Springer International Publishing, 2017).
Sand, A. et al. tqDist: a library for computing the quartet and triplet distances between binary or general trees. Bioinformatics 30, 2079–2080 (2014).
pubmed: 24651968 doi: 10.1093/bioinformatics/btu157 pmcid: 24651968
Manara, S. et al. Microbial genomes from non-human primate gut metagenomes expand the primate-associated bacterial tree of life with over 1000 novel species. Genome Biol. 20, 299 (2019).
pubmed: 31883524 pmcid: 6935492 doi: 10.1186/s13059-019-1923-9
Alikhan, N.-F., Zhou, Z., Sergeant, M. J. & Achtman, M. A genomic overview of the population structure of Salmonella. PLoS Genet. 14, e1007261 (2018).
pubmed: 29621240 pmcid: 5886390 doi: 10.1371/journal.pgen.1007261

Auteurs

Francesco Asnicar (F)

Department CIBIO, University of Trento, Trento, Italy.

Andrew Maltez Thomas (AM)

Department CIBIO, University of Trento, Trento, Italy.

Francesco Beghini (F)

Department CIBIO, University of Trento, Trento, Italy.

Claudia Mengoni (C)

Department CIBIO, University of Trento, Trento, Italy.

Serena Manara (S)

Department CIBIO, University of Trento, Trento, Italy.

Paolo Manghi (P)

Department CIBIO, University of Trento, Trento, Italy.

Qiyun Zhu (Q)

Department of Pediatrics, University of California San Diego, La Jolla, CA, USA.

Mattia Bolzan (M)

Department CIBIO, University of Trento, Trento, Italy.
PreBiomics s.r.l, Trento, Italy.

Fabio Cumbo (F)

Department CIBIO, University of Trento, Trento, Italy.

Uyen May (U)

Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA, USA.

Jon G Sanders (JG)

Department of Pediatrics, University of California San Diego, La Jolla, CA, USA.
Cornell Institute for Host-Microbe Interaction and Disease, Cornell University, Ithaca, NY, USA.

Moreno Zolfo (M)

Department CIBIO, University of Trento, Trento, Italy.

Evguenia Kopylova (E)

Department of Pediatrics, University of California San Diego, La Jolla, CA, USA.
Clarity Genomics BVBA, Sint-Michielskaai 34, 2000, Antwerpen, Belgium.

Edoardo Pasolli (E)

Department CIBIO, University of Trento, Trento, Italy.
Department of Agricultural Sciences, University of Naples Federico II, Portici, Italy.

Rob Knight (R)

Department of Pediatrics, University of California San Diego, La Jolla, CA, USA.
Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA.
Center for Microbiome Innovation, University of California San Diego, La Jolla, CA, USA.
Department of Bioengineering, University of California San Diego, La Jolla, CA, USA.

Siavash Mirarab (S)

Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA, USA.

Curtis Huttenhower (C)

Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA.
The Broad Institute of MIT and Harvard, Cambridge, MA, USA.

Nicola Segata (N)

Department CIBIO, University of Trento, Trento, Italy. nicola.segata@unitn.it.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing
Animals Hemiptera Insect Proteins Phylogeny Insecticides
Populus Soil Microbiology Soil Microbiota Fungi
Amaryllidaceae Alkaloids Lycoris NADPH-Ferrihemoprotein Reductase Gene Expression Regulation, Plant Plant Proteins

Classifications MeSH