Greengenes2 unifies microbial data in a single reference tree.
Journal
Nature biotechnology
ISSN: 1546-1696
Titre abrégé: Nat Biotechnol
Pays: United States
ID NLM: 9604648
Informations de publication
Date de publication:
27 Jul 2023
27 Jul 2023
Historique:
received:
16
12
2022
accepted:
25
05
2023
pubmed:
28
7
2023
medline:
28
7
2023
entrez:
27
7
2023
Statut:
aheadofprint
Résumé
Studies using 16S rRNA and shotgun metagenomics typically yield different results, usually attributed to PCR amplification biases. We introduce Greengenes2, a reference tree that unifies genomic and 16S rRNA databases in a consistent, integrated resource. By inserting sequences into a whole-genome phylogeny, we show that 16S rRNA and shotgun metagenomic data generated from the same samples agree in principal coordinates space, taxonomy and phenotype effect size when analyzed with the same tree.
Identifiants
pubmed: 37500913
doi: 10.1038/s41587-023-01845-1
pii: 10.1038/s41587-023-01845-1
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Subventions
Organisme : NIA NIH HHS
ID : U19 AG063744
Pays : United States
Organisme : NIDDK NIH HHS
ID : U24 DK131617
Pays : United States
Commentaires et corrections
Type : ErratumIn
Informations de copyright
© 2023. The Author(s).
Références
Zhu, Q. et al. Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains Bacteria and Archaea. Nat. Commun. 10, 5477 (2019).
doi: 10.1038/s41467-019-13443-4
pubmed: 31792218
pmcid: 6889312
Parks, D. H. et al. GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Res. 50, D785–D794 (2022).
doi: 10.1093/nar/gkab776
pubmed: 34520557
Quast, C. et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 41, D590–D596 (2013).
doi: 10.1093/nar/gks1219
pubmed: 23193283
McDonald, D. et al. An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of Bacteria and Archaea. ISME J. 6, 610–618 (2012).
doi: 10.1038/ismej.2011.139
pubmed: 22134646
Balaban, M. et al. Generation of accurate, expandable phylogenomic trees with uDANCE. Nat. Biotechnol. https://doi.org/10.1038/s41587-023-01868-8 (2023).
Hugenholtz, P., Chuvochina, M., Oren, A., Parks, D. H. & Soo, R. M. Prokaryotic taxonomy and nomenclature in the age of big sequence data. ISME J. 15, 1879–1892 (2021).
doi: 10.1038/s41396-021-00941-x
pubmed: 33824426
pmcid: 8245423
Ludwig, W. et al. Release LTP_12_2020, featuring a new ARB alignment and improved 16S rRNA tree for prokaryotic type strains. Syst. Appl. Microbiol. 44, 126218 (2021).
doi: 10.1016/j.syapm.2021.126218
pubmed: 34111737
Karst, S. M. et al. High-accuracy long-read amplicon sequences using unique molecular identifiers with Nanopore or PacBio sequencing. Nat. Methods 18, 165–169 (2021).
doi: 10.1038/s41592-020-01041-y
pubmed: 33432244
Shaffer, J. P. et al. Standardized multi-omics of Earth’s microbiomes reveals microbial and metabolite diversity. Nat. Microbiol. 7, 2128–2150 (2022).
doi: 10.1038/s41564-022-01266-x
pubmed: 36443458
pmcid: 9712116
Amir, A. et al. Deblur rapidly resolves single-nucleotide community sequence patterns. mSystems 2, e00191-16 (2017).
doi: 10.1128/mSystems.00191-16
pubmed: 28289731
pmcid: 5340863
Gonzalez, A. et al. Qiita: rapid, web-enabled microbiome meta-analysis. Nat. Methods 15, 796–798 (2018).
doi: 10.1038/s41592-018-0141-9
pubmed: 30275573
pmcid: 6235622
Jiang, Y., McDonald, D., Knight, R. & Mirarab, S. Scaling deep phylogenetic embedding to ultra-large reference trees: a tree-aware ensemble approach. Preprint at bioRxiv https://doi.org/10.1101/2023.03.27.534201 (2023).
Thompson, L. R. et al. A communal catalogue reveals Earth’s multiscale microbial diversity. Nature 551, 457–463 (2017).
doi: 10.1038/nature24621
pubmed: 29088705
pmcid: 6192678
McDonald, D. et al. American Gut: an open platform for citizen science microbiome research. mSystems 3, e00031-18 (2018).
doi: 10.1128/mSystems.00031-18
pubmed: 29795809
pmcid: 5954204
Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012).
doi: 10.1038/nature11234
Salosensaari, A. et al. Taxonomic signatures of cause-specific mortality risk in human gut microbiome. Nat. Commun. 12, 2671 (2021).
doi: 10.1038/s41467-021-22962-y
pubmed: 33976176
pmcid: 8113604
Bray, J. R. & Curtis, J. T. An ordination of the upland forest communities of southern Wisconsin. Ecol. Monogr. 27, 325–349 (1957).
doi: 10.2307/1942268
Sfiligoi, I., Armstrong, G., Gonzalez, A., McDonald, D. & Knight, R. Optimizing UniFrac with OpenACC yields greater than one thousand times speed increase. mSystems 7, e0002822 (2022).
doi: 10.1128/msystems.00028-22
pubmed: 35638356
Zhu, Q. et al. Phylogeny-aware analysis of metagenome community ecology based on matched reference genomes while bypassing taxonomy. mSystems 7, e0016722 (2022).
doi: 10.1128/msystems.00167-22
pubmed: 35369727
Bokulich, N. A. et al. Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2’s q2-feature-classifier plugin. Microbiome 6, 90 (2018).
doi: 10.1186/s40168-018-0470-z
pubmed: 29773078
pmcid: 5956843
Schloss, P. D. Identifying and overcoming threats to reproducibility, replicability, robustness, and generalizability in microbiome research. mBio 9, e00525-18 (2018).
doi: 10.1128/mBio.00525-18
pubmed: 29871915
pmcid: 5989067
Sinha, R. et al. Assessment of variation in microbial community amplicon sequencing by the Microbiome Quality Control (MBQC) project consortium. Nat. Biotechnol. 35, 1077–1086 (2017).
doi: 10.1038/nbt.3981
pubmed: 28967885
pmcid: 5839636
Cantrell, K. et al. EMPress enables tree-guided, interactive, and exploratory analyses of multi-omic data sets. mSystems 6, e01216-20 (2021).
doi: 10.1128/mSystems.01216-20
pubmed: 33727399
pmcid: 8546999
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
doi: 10.1016/S0022-2836(05)80360-2
pubmed: 2231712
Nguyen, N.-P. D., Mirarab, S., Kumar, K. & Warnow, T. Ultra-large alignments using phylogeny-aware profiles. Genome Biol. 16, 124 (2015).
doi: 10.1186/s13059-015-0688-z
pubmed: 26076734
pmcid: 4492008
Minh, B. Q. et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol. 37, 1530–1534 (2020).
doi: 10.1093/molbev/msaa015
pubmed: 32011700
pmcid: 7182206
McDonald, D. et al. redbiom: a rapid sample discovery and feature characterization system. mSystems 4, e00215-19 (2019).
doi: 10.1128/mSystems.00215-19
pubmed: 31239397
pmcid: 6593222
Balaban, M., Jiang, Y., Roush, D., Zhu, Q. & Mirarab, S. Fast and accurate distance-based phylogenetic placement using divide and conquer. Mol. Ecol. Resour. 22, 1213–1227 (2022).
doi: 10.1111/1755-0998.13527
pubmed: 34643995
Matsen, F. A., Hoffman, N. G., Gallagher, A. & Stamatakis, A. A format for phylogenetic placements. PLoS ONE 7, e31009 (2012).
doi: 10.1371/journal.pone.0031009
pubmed: 22383988
pmcid: 3284489
McDonald, D. Improved-octo-waddle. GitHub https://github.com/biocore/improved-octo-waddle/ (2023).
Bolyen, E. et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat. Biotechnol. 37, 852–857 (2019).
doi: 10.1038/s41587-019-0209-9
pubmed: 31341288
pmcid: 7015180
Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).
doi: 10.1038/s41592-019-0686-2
pubmed: 32015543
pmcid: 7056644
Hunter, J. D. Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007).
doi: 10.1109/MCSE.2007.55
Vázquez-Baeza, Y., Pirrung, M., Gonzalez, A. & Knight, R. EMPeror: a tool for visualizing high-throughput microbial community data. Gigascience 2, 16 (2013).
doi: 10.1186/2047-217X-2-16
pubmed: 24280061
pmcid: 4076506
Janssen, S. et al. Phylogenetic placement of exact amplicon sequences improves associations with clinical information. mSystems 3, e00021-18 (2018).
doi: 10.1128/mSystems.00021-18
pubmed: 29719869
pmcid: 5904434
Rahman, G. et al. Determination of effect sizes for power analysis of microbiome studies using large mircrobiome datasets. Genes https://doi.org/10.3390/genes14061239 (2023).
McDonald, D. q2-greengenes2. GitHub https://github.com/biocore/q2-greengenes2/ (2023).
McDonald, D. greengenes2. GitHub https://github.com/biocore/greengenes2 (2023).
Balaban, M. uDance. GitHub https://github.com/balabanmetin/uDance (2023).
Jiang, Y. DEPP. GitHub https://github.com/yueyujiang/DEPP (2023).
McDonald, D. Greengenes2 analyses. GitHub https://github.com/knightlab-analyses/greengenes2 (2023).