Precise phylogenetic analysis of microbial isolates and genomes from metagenomes using PhyloPhlAn 3.0.
Journal
Nature communications
ISSN: 2041-1723
Titre abrégé: Nat Commun
Pays: England
ID NLM: 101528555
Informations de publication
Date de publication:
19 05 2020
19 05 2020
Historique:
received:
17
08
2019
accepted:
27
04
2020
entrez:
20
5
2020
pubmed:
20
5
2020
medline:
25
8
2020
Statut:
epublish
Résumé
Microbial genomes are available at an ever-increasing pace, as cultivation and sequencing become cheaper and obtaining metagenome-assembled genomes (MAGs) becomes more effective. Phylogenetic placement methods to contextualize hundreds of thousands of genomes must thus be efficiently scalable and sensitive from closely related strains to divergent phyla. We present PhyloPhlAn 3.0, an accurate, rapid, and easy-to-use method for large-scale microbial genome characterization and phylogenetic analysis at multiple levels of resolution. PhyloPhlAn 3.0 can assign genomes from isolate sequencing or MAGs to species-level genome bins built from >230,000 publically available sequences. For individual clades of interest, it reconstructs strain-level phylogenies from among the closest species using clade-specific maximally informative markers. At the other extreme of resolution, it scales to large phylogenies comprising >17,000 microbial species. Examples including Staphylococcus aureus isolates, gut metagenomes, and meta-analyses demonstrate the ability of PhyloPhlAn 3.0 to support genomic and metagenomic analyses.
Identifiants
pubmed: 32427907
doi: 10.1038/s41467-020-16366-7
pii: 10.1038/s41467-020-16366-7
pmc: PMC7237447
doi:
Types de publication
Evaluation Study
Journal Article
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
2500Subventions
Organisme : NCI NIH HHS
ID : R01 CA230551
Pays : United States
Organisme : NCI NIH HHS
ID : U01 CA230551
Pays : United States
Références
Segata, N., Börnigen, D., Morgan, X. C. & Huttenhower, C. PhyloPhlAn is a new method for improved phylogenetic and taxonomic placement of microbes. Nat. Commun. 4, 2304 (2013).
pubmed: 23942190
pmcid: 3760377
doi: 10.1038/ncomms3304
Darling, A. E. et al. PhyloSift: phylogenetic analysis of genomes and metagenomes. PeerJ 2, e243 (2014).
pubmed: 24482762
pmcid: 3897386
doi: 10.7717/peerj.243
Wu, Y.-W. ezTree: an automated pipeline for identifying phylogenetic marker genes and inferring evolutionary relationships among uncultivated prokaryotic draft genomes. BMC Genomics 19, 921 (2018).
pubmed: 29363425
pmcid: 5780852
doi: 10.1186/s12864-017-4327-9
Lee, M. D. GToTree: a user-friendly workflow for phylogenomics. Bioinformatics https://doi.org/10.1093/bioinformatics/btz188 (2019).
Wu, M. & Eisen, J. A. A simple, fast, and accurate method of phylogenomic inference. Genome Biol. 9, R151 (2008).
pubmed: 18851752
pmcid: 2760878
doi: 10.1186/gb-2008-9-10-r151
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).
pubmed: 24451623
pmcid: 24451623
doi: 10.1093/bioinformatics/btu033
Marçais, G. et al. MUMmer4: a fast and versatile genome alignment system. PLoS Comput. Biol. 14, e1005944 (2018).
pubmed: 29373581
pmcid: 5802927
doi: 10.1371/journal.pcbi.1005944
Matsen, F. A., Kodner, R. B. & Armbrust, E. V. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinform. 11, 538 (2010).
doi: 10.1186/1471-2105-11-538
Mirarab, S., Nguyen, N. & Warnow, T. SEPP: SATé-enabled phylogenetic placement. Pac. Symp. Biocomput. 17, 247–258 (2012).
Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).
pubmed: 15034147
pmcid: 15034147
doi: 10.1093/nar/gkh340
Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).
pubmed: 3603318
pmcid: 3603318
doi: 10.1093/molbev/mst010
Notredame, C., Higgins, D. G. & Heringa, J. T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302, 205–217 (2000).
pubmed: 10964570
doi: 10.1006/jmbi.2000.4042
pmcid: 10964570
Wheeler, T. J. & Kececioglu, J. D. Multiple alignment by aligning alignments. Bioinformatics 23, i559–i568 (2007).
pubmed: 17646343
doi: 10.1093/bioinformatics/btm226
pmcid: 17646343
Mirarab, S. et al. PASTA: ultra-large multiple sequence alignment for nucleotide and amino-acid sequences. J. Comput. Biol. 22, 377–386 (2015).
pubmed: 25549288
pmcid: 4424971
doi: 10.1089/cmb.2014.0156
Nguyen, N.-P., Mirarab, S., Kumar, K. & Warnow, T. Ultra-large alignments using ensembles of hidden markov models. (ed. Przytycka, T. M.) in Research in Computational Molecular Biology 259–260 (Springer International Publishing, 2015).
Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2-approximately maximum-likelihood trees for large alignments. PLoS ONE 5, e9490 (2010).
pubmed: 20224823
pmcid: 20224823
doi: 10.1371/journal.pone.0009490
Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol. Biol. Evol. 26, 1641–1650 (2009).
pubmed: 19377059
pmcid: 19377059
doi: 10.1093/molbev/msp077
Mirarab, S. et al. ASTRAL: genome-scale coalescent-based species tree estimation. Bioinformatics 30, i541–i548 (2014).
pubmed: 25161245
pmcid: 4147915
doi: 10.1093/bioinformatics/btu462
Mirarab, S. & Warnow, T. ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics 31, i44–i52 (2015).
pubmed: 26072508
pmcid: 4765870
doi: 10.1093/bioinformatics/btv234
Zhang, C., Sayyari, E. & Mirarab, S. ASTRAL-III: increased scalability and impacts of contracting low support branches. in Comparative Genomics 53–75 (Springer International Publishing, 2017).
Vachaspati, P. & Warnow, T. ASTRID: accurate species trees from internode distances. BMC Genomics 16(Suppl 10), S3 (2015).
pubmed: 26449326
pmcid: 4602181
doi: 10.1186/1471-2164-16-S10-S3
Nguyen, L.-T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).
pubmed: 25371430
doi: 10.1093/molbev/msu300
pmcid: 25371430
Maiden, M. C. et al. Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc. Natl Acad. Sci. USA 95, 3140–3145 (1998).
pubmed: 9501229
doi: 10.1073/pnas.95.6.3140
pmcid: 9501229
Page, A. J. et al. Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics 31, 3691–3693 (2015).
pubmed: 26198102
pmcid: 4817141
doi: 10.1093/bioinformatics/btv421
Parks, D. H. et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat. Biotechnol. 36, 996–1004 (2018).
pubmed: 30148503
pmcid: 30148503
doi: 10.1038/nbt.4229
Zhu, Q. et al. Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains Bacteria and Archaea. Nat. Commun. 10, 5477 (2019).
pubmed: 31792218
pmcid: 31792218
doi: 10.1038/s41467-019-13443-4
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
doi: 10.1016/S0022-2836(05)80360-2
Edgar, R. C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460–2461 (2010).
pubmed: 20709691
pmcid: 20709691
doi: 10.1093/bioinformatics/btq461
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).
pubmed: 25402007
pmcid: 25402007
doi: 10.1038/nmeth.3176
Nguyen, N.-P. D., Mirarab, S., Kumar, K. & Warnow, T. Ultra-large alignments using phylogeny-aware profiles. Genome Biol. 16, 124 (2015).
pubmed: 26076734
pmcid: 4492008
doi: 10.1186/s13059-015-0688-z
Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009).
pubmed: 19505945
pmcid: 2712344
doi: 10.1093/bioinformatics/btp348
Chewapreecha, C. et al. Dense genomic sampling identifies highways of pneumococcal recombination. Nat. Genet. 46, 305–309 (2014).
pubmed: 24509479
pmcid: 3970364
doi: 10.1038/ng.2895
Bratcher, H. B., Corton, C., Jolley, K. A., Parkhill, J. & Maiden, M. C. J. A gene-by-gene population genomics platform: de novo assembly, annotation and genealogical analysis of 108 representative Neisseria meningitidis genomes. BMC Genomics 15, 1138 (2014).
pubmed: 25523208
pmcid: 4377854
doi: 10.1186/1471-2164-15-1138
Walker, T. M. et al. Whole-genome sequencing for prediction of Mycobacterium tuberculosis drug susceptibility and resistance: a retrospective cohort study. Lancet Infect. Dis. 15, 1193–1202 (2015).
pubmed: 26116186
pmcid: 4579482
doi: 10.1016/S1473-3099(15)00062-6
Suzek, B. E., Huang, H., McGarvey, P., Mazumder, R. & Wu, C. H. UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23, 1282–1288 (2007).
pubmed: 17379688
doi: 10.1093/bioinformatics/btm098
pmcid: 17379688
Manara, S. et al. Whole-genome epidemiology, characterisation, and phylogenetic reconstruction of Staphylococcus aureus strains in a paediatric hospital. Genome Med. 10, 82 (2018).
pubmed: 30424799
pmcid: 6234625
doi: 10.1186/s13073-018-0593-7
Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068–2069 (2014).
pubmed: 24642063
doi: 10.1093/bioinformatics/btu153
pmcid: 24642063
NCBI Resource Coordinators. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 42, D7–D17 (2014).
doi: 10.1093/nar/gkt1146
Asnicar, F., Weingart, G., Tickle, T. L., Huttenhower, C. & Segata, N. Compact graphical representation of phylogenetic data and metadata with GraPhlAn. PeerJ 3, e1029 (2015).
pubmed: 26157614
pmcid: 4476132
doi: 10.7717/peerj.1029
Thomas, A. M. & Segata, N. Multiple levels of the unknown in microbiome research. BMC Biol. 17, 48 (2019).
pubmed: 31189463
pmcid: 6560723
doi: 10.1186/s12915-019-0667-z
Pasolli, E. et al. Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle. Cell 176, 649–662.e20 (2019).
pubmed: 30661755
pmcid: 6349461
doi: 10.1016/j.cell.2019.01.001
Ondov, B. D. et al. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol. 17, 132 (2016).
pubmed: 27323842
pmcid: 27323842
doi: 10.1186/s13059-016-0997-x
Bowers, R. M. et al. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nat. Biotechnol. 35, 725–731 (2017).
pubmed: 28787424
pmcid: 28787424
doi: 10.1038/nbt.3893
Jain, C., Rodriguez-R, L. M., Phillippy, A. M., Konstantinidis, K. T. & Aluru, S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nat. Commun. 9, 5114 (2018).
pubmed: 30504855
pmcid: 30504855
doi: 10.1038/s41467-018-07641-9
Zou, Y. et al. 1,520 reference genomes from cultivated human gut bacteria enable functional microbiome analyses. Nat. Biotechnol. 37, 179–185 (2019).
pubmed: 30718868
pmcid: 6784896
doi: 10.1038/s41587-018-0008-8
Tett, A. et al. The Prevotella copri complex comprises four distinct clades underrepresented in westernized populations. Cell Host Microbe 26, 666–679.e7 (2019).
pubmed: 31607556
pmcid: 6854460
doi: 10.1016/j.chom.2019.08.018
Zolfo, M., Tett, A., Jousson, O., Donati, C. & Segata, N. MetaMLST: multi-locus strain-level bacterial typing from metagenomic samples. Nucleic Acids Res. 45, e7 (2017).
pubmed: 27651451
doi: 10.1093/nar/gkw837
pmcid: 27651451
Obregon-Tito, A. J. et al. Subsistence strategies in traditional societies distinguish gut microbiomes. Nat. Commun. 6, 6505 (2015).
pubmed: 25807110
pmcid: 4386023
doi: 10.1038/ncomms7505
Rampelli, S. et al. Metagenome sequencing of the Hadza Hunter-Gatherer gut microbiota. Curr. Biol. 25, 1682–1693 (2015).
pubmed: 25981789
doi: 10.1016/j.cub.2015.04.055
pmcid: 25981789
David, L. A. et al. Gut microbial succession follows acute secretory diarrhea in humans. MBio 6, e00381–15 (2015).
pubmed: 25991682
pmcid: 4442136
UniProt Consortium. Activities at the universal protein resource. Nucleic Acids Res. 42, D191–D198 (2014).
doi: 10.1093/nar/gku469
Parks, D. H. et al. Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nat. Microbiol. 2, 1533–1542 (2017).
doi: 10.1038/s41564-017-0012-7
Valdar, W. S. J. Scoring residue conservation. Proteins 48, 227–241 (2002).
pubmed: 12112692
doi: 10.1002/prot.10146
pmcid: 12112692
Hug, L. A. et al. A new view of the tree of life. Nat. Microbiol. 1, 16048 (2016).
pubmed: 27572647
pmcid: 27572647
doi: 10.1038/nmicrobiol.2016.48
Brown, C. T. et al. Unusual biology across a group comprising more than 15% of domain Bacteria. Nature 523, 208–211 (2015).
pubmed: 26083755
doi: 10.1038/nature14486
pmcid: 26083755
Ciccarelli, F. D. et al. Toward automatic reconstruction of a highly resolved tree of life. Science 311, 1283–1287 (2006).
pubmed: 16513982
doi: 10.1126/science.1123061
pmcid: 16513982
Salichos, L., Stamatakis, A. & Rokas, A. Novel information theory-based measures for quantifying incongruence among phylogenetic trees. Mol. Biol. Evol. 31, 1261–1271 (2014).
pubmed: 24509691
doi: 10.1093/molbev/msu061
pmcid: 24509691
Bursteinas, B. et al. Minimizing proteome redundancy in the UniProt Knowledgebase. Database 2016, 1–9 (2016).
Castresana, J. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol. Biol. Evol. 17, 540–552 (2000).
pubmed: 10742046
doi: 10.1093/oxfordjournals.molbev.a026334
pmcid: 10742046
Dress, A. W. M. et al. Noisy: identification of problematic columns in multiple sequence alignments. Algorithms Mol. Biol. 3, 7 (2008).
pubmed: 18577231
pmcid: 2464588
doi: 10.1186/1748-7188-3-7
Sela, I., Ashkenazy, H., Katoh, K. & Pupko, T. GUIDANCE2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters. Nucleic Acids Res. 43, W7–W14 (2015).
pubmed: 25883146
pmcid: 4489236
doi: 10.1093/nar/gkv318
Webb, A. E., Walsh, T. A. & O’Connell, M. J. VESPA: very large-scale evolutionary and selective pressure analyses. PeerJ Comput. Sci. 3, e118 (2017).
doi: 10.7717/peerj-cs.118
Tan, G. et al. Current methods for automated filtering of multiple sequence alignments frequently worsen single-gene phylogenetic inference. Syst. Biol. 64, 778–791 (2015).
pubmed: 26031838
pmcid: 4538881
doi: 10.1093/sysbio/syv033
Chang, J.-M., Di Tommaso, P. & Notredame, C. TCS: a new multiple sequence alignment reliability measure to estimate alignment accuracy and improve phylogenetic tree reconstruction. Mol. Biol. Evol. 31, 1625–1637 (2014).
pubmed: 24694831
doi: 10.1093/molbev/msu117
pmcid: 24694831
Edgar, R. C. Optimizing substitution matrix choice and gap parameters for sequence alignment. BMC Bioinform. 10, 396 (2009).
doi: 10.1186/1471-2105-10-396
Penn, O., Privman, E., Landan, G., Graur, D. & Pupko, T. An alignment confidence score capturing robustness to guide tree uncertainty. Mol. Biol. Evol. 27, 1759–1767 (2010).
pubmed: 20207713
pmcid: 2908709
doi: 10.1093/molbev/msq066
Talavera, G. & Castresana, J. Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst. Biol. 56, 564–577 (2007).
pubmed: 17654362
doi: 10.1080/10635150701472164
pmcid: 17654362
Yamada, K. & Tomii, K. Revisiting amino acid substitution matrices for identifying distantly related proteins. Bioinformatics 30, 317–325 (2014).
pubmed: 24281694
doi: 10.1093/bioinformatics/btt694
pmcid: 24281694
Keul, F., Hess, M., Goesele, M. & Hamacher, K. PFASUM: a substitution matrix from Pfam structural alignments. BMC Bioinform. 18, 293 (2017).
doi: 10.1186/s12859-017-1703-z
Mai, U. & Mirarab, S. TreeShrink: efficient detection of outlier tree leaves. in Comparative Genomics 116–140 (Springer International Publishing, 2017).
Sand, A. et al. tqDist: a library for computing the quartet and triplet distances between binary or general trees. Bioinformatics 30, 2079–2080 (2014).
pubmed: 24651968
doi: 10.1093/bioinformatics/btu157
pmcid: 24651968
Manara, S. et al. Microbial genomes from non-human primate gut metagenomes expand the primate-associated bacterial tree of life with over 1000 novel species. Genome Biol. 20, 299 (2019).
pubmed: 31883524
pmcid: 6935492
doi: 10.1186/s13059-019-1923-9
Alikhan, N.-F., Zhou, Z., Sergeant, M. J. & Achtman, M. A genomic overview of the population structure of Salmonella. PLoS Genet. 14, e1007261 (2018).
pubmed: 29621240
pmcid: 5886390
doi: 10.1371/journal.pgen.1007261