Greengenes2 unifies microbial data in a single reference tree.


Journal

Nature biotechnology
ISSN: 1546-1696
Titre abrégé: Nat Biotechnol
Pays: United States
ID NLM: 9604648

Informations de publication

Date de publication:
27 Jul 2023
Historique:
received: 16 12 2022
accepted: 25 05 2023
pubmed: 28 7 2023
medline: 28 7 2023
entrez: 27 7 2023
Statut: aheadofprint

Résumé

Studies using 16S rRNA and shotgun metagenomics typically yield different results, usually attributed to PCR amplification biases. We introduce Greengenes2, a reference tree that unifies genomic and 16S rRNA databases in a consistent, integrated resource. By inserting sequences into a whole-genome phylogeny, we show that 16S rRNA and shotgun metagenomic data generated from the same samples agree in principal coordinates space, taxonomy and phenotype effect size when analyzed with the same tree.

Identifiants

pubmed: 37500913
doi: 10.1038/s41587-023-01845-1
pii: 10.1038/s41587-023-01845-1
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Subventions

Organisme : NIA NIH HHS
ID : U19 AG063744
Pays : United States
Organisme : NIDDK NIH HHS
ID : U24 DK131617
Pays : United States

Commentaires et corrections

Type : ErratumIn

Informations de copyright

© 2023. The Author(s).

Références

Zhu, Q. et al. Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains Bacteria and Archaea. Nat. Commun. 10, 5477 (2019).
doi: 10.1038/s41467-019-13443-4 pubmed: 31792218 pmcid: 6889312
Parks, D. H. et al. GTDB: an ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Res. 50, D785–D794 (2022).
doi: 10.1093/nar/gkab776 pubmed: 34520557
Quast, C. et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 41, D590–D596 (2013).
doi: 10.1093/nar/gks1219 pubmed: 23193283
McDonald, D. et al. An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of Bacteria and Archaea. ISME J. 6, 610–618 (2012).
doi: 10.1038/ismej.2011.139 pubmed: 22134646
Balaban, M. et al. Generation of accurate, expandable phylogenomic trees with uDANCE. Nat. Biotechnol. https://doi.org/10.1038/s41587-023-01868-8 (2023).
Hugenholtz, P., Chuvochina, M., Oren, A., Parks, D. H. & Soo, R. M. Prokaryotic taxonomy and nomenclature in the age of big sequence data. ISME J. 15, 1879–1892 (2021).
doi: 10.1038/s41396-021-00941-x pubmed: 33824426 pmcid: 8245423
Ludwig, W. et al. Release LTP_12_2020, featuring a new ARB alignment and improved 16S rRNA tree for prokaryotic type strains. Syst. Appl. Microbiol. 44, 126218 (2021).
doi: 10.1016/j.syapm.2021.126218 pubmed: 34111737
Karst, S. M. et al. High-accuracy long-read amplicon sequences using unique molecular identifiers with Nanopore or PacBio sequencing. Nat. Methods 18, 165–169 (2021).
doi: 10.1038/s41592-020-01041-y pubmed: 33432244
Shaffer, J. P. et al. Standardized multi-omics of Earth’s microbiomes reveals microbial and metabolite diversity. Nat. Microbiol. 7, 2128–2150 (2022).
doi: 10.1038/s41564-022-01266-x pubmed: 36443458 pmcid: 9712116
Amir, A. et al. Deblur rapidly resolves single-nucleotide community sequence patterns. mSystems 2, e00191-16 (2017).
doi: 10.1128/mSystems.00191-16 pubmed: 28289731 pmcid: 5340863
Gonzalez, A. et al. Qiita: rapid, web-enabled microbiome meta-analysis. Nat. Methods 15, 796–798 (2018).
doi: 10.1038/s41592-018-0141-9 pubmed: 30275573 pmcid: 6235622
Jiang, Y., McDonald, D., Knight, R. & Mirarab, S. Scaling deep phylogenetic embedding to ultra-large reference trees: a tree-aware ensemble approach. Preprint at bioRxiv https://doi.org/10.1101/2023.03.27.534201 (2023).
Thompson, L. R. et al. A communal catalogue reveals Earth’s multiscale microbial diversity. Nature 551, 457–463 (2017).
doi: 10.1038/nature24621 pubmed: 29088705 pmcid: 6192678
McDonald, D. et al. American Gut: an open platform for citizen science microbiome research. mSystems 3, e00031-18 (2018).
doi: 10.1128/mSystems.00031-18 pubmed: 29795809 pmcid: 5954204
Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012).
doi: 10.1038/nature11234
Salosensaari, A. et al. Taxonomic signatures of cause-specific mortality risk in human gut microbiome. Nat. Commun. 12, 2671 (2021).
doi: 10.1038/s41467-021-22962-y pubmed: 33976176 pmcid: 8113604
Bray, J. R. & Curtis, J. T. An ordination of the upland forest communities of southern Wisconsin. Ecol. Monogr. 27, 325–349 (1957).
doi: 10.2307/1942268
Sfiligoi, I., Armstrong, G., Gonzalez, A., McDonald, D. & Knight, R. Optimizing UniFrac with OpenACC yields greater than one thousand times speed increase. mSystems 7, e0002822 (2022).
doi: 10.1128/msystems.00028-22 pubmed: 35638356
Zhu, Q. et al. Phylogeny-aware analysis of metagenome community ecology based on matched reference genomes while bypassing taxonomy. mSystems 7, e0016722 (2022).
doi: 10.1128/msystems.00167-22 pubmed: 35369727
Bokulich, N. A. et al. Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2’s q2-feature-classifier plugin. Microbiome 6, 90 (2018).
doi: 10.1186/s40168-018-0470-z pubmed: 29773078 pmcid: 5956843
Schloss, P. D. Identifying and overcoming threats to reproducibility, replicability, robustness, and generalizability in microbiome research. mBio 9, e00525-18 (2018).
doi: 10.1128/mBio.00525-18 pubmed: 29871915 pmcid: 5989067
Sinha, R. et al. Assessment of variation in microbial community amplicon sequencing by the Microbiome Quality Control (MBQC) project consortium. Nat. Biotechnol. 35, 1077–1086 (2017).
doi: 10.1038/nbt.3981 pubmed: 28967885 pmcid: 5839636
Cantrell, K. et al. EMPress enables tree-guided, interactive, and exploratory analyses of multi-omic data sets. mSystems 6, e01216-20 (2021).
doi: 10.1128/mSystems.01216-20 pubmed: 33727399 pmcid: 8546999
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
doi: 10.1016/S0022-2836(05)80360-2 pubmed: 2231712
Nguyen, N.-P. D., Mirarab, S., Kumar, K. & Warnow, T. Ultra-large alignments using phylogeny-aware profiles. Genome Biol. 16, 124 (2015).
doi: 10.1186/s13059-015-0688-z pubmed: 26076734 pmcid: 4492008
Minh, B. Q. et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol. 37, 1530–1534 (2020).
doi: 10.1093/molbev/msaa015 pubmed: 32011700 pmcid: 7182206
McDonald, D. et al. redbiom: a rapid sample discovery and feature characterization system. mSystems 4, e00215-19 (2019).
doi: 10.1128/mSystems.00215-19 pubmed: 31239397 pmcid: 6593222
Balaban, M., Jiang, Y., Roush, D., Zhu, Q. & Mirarab, S. Fast and accurate distance-based phylogenetic placement using divide and conquer. Mol. Ecol. Resour. 22, 1213–1227 (2022).
doi: 10.1111/1755-0998.13527 pubmed: 34643995
Matsen, F. A., Hoffman, N. G., Gallagher, A. & Stamatakis, A. A format for phylogenetic placements. PLoS ONE 7, e31009 (2012).
doi: 10.1371/journal.pone.0031009 pubmed: 22383988 pmcid: 3284489
McDonald, D. Improved-octo-waddle. GitHub https://github.com/biocore/improved-octo-waddle/ (2023).
Bolyen, E. et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat. Biotechnol. 37, 852–857 (2019).
doi: 10.1038/s41587-019-0209-9 pubmed: 31341288 pmcid: 7015180
Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).
doi: 10.1038/s41592-019-0686-2 pubmed: 32015543 pmcid: 7056644
Hunter, J. D. Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007).
doi: 10.1109/MCSE.2007.55
Vázquez-Baeza, Y., Pirrung, M., Gonzalez, A. & Knight, R. EMPeror: a tool for visualizing high-throughput microbial community data. Gigascience 2, 16 (2013).
doi: 10.1186/2047-217X-2-16 pubmed: 24280061 pmcid: 4076506
Janssen, S. et al. Phylogenetic placement of exact amplicon sequences improves associations with clinical information. mSystems 3, e00021-18 (2018).
doi: 10.1128/mSystems.00021-18 pubmed: 29719869 pmcid: 5904434
Rahman, G. et al. Determination of effect sizes for power analysis of microbiome studies using large mircrobiome datasets. Genes https://doi.org/10.3390/genes14061239 (2023).
McDonald, D. q2-greengenes2. GitHub https://github.com/biocore/q2-greengenes2/ (2023).
McDonald, D. greengenes2. GitHub https://github.com/biocore/greengenes2 (2023).
Balaban, M. uDance. GitHub https://github.com/balabanmetin/uDance (2023).
Jiang, Y. DEPP. GitHub https://github.com/yueyujiang/DEPP (2023).
McDonald, D. Greengenes2 analyses. GitHub https://github.com/knightlab-analyses/greengenes2 (2023).

Auteurs

Daniel McDonald (D)

Department of Pediatrics, University of California San Diego School of Medicine, La Jolla, CA, USA.

Yueyu Jiang (Y)

Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA, USA.

Metin Balaban (M)

Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA.

Kalen Cantrell (K)

Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA.

Qiyun Zhu (Q)

School of Life Sciences, Arizona State University, Tempe, AZ, USA.
Biodesign Center for Fundamental and Applied Microbiomics, Arizona State University, Tempe, AZ, USA.

Antonio Gonzalez (A)

Department of Pediatrics, University of California San Diego School of Medicine, La Jolla, CA, USA.

James T Morton (JT)

Biostatistics & Bioinformatics Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD, USA.

Giorgia Nicolaou (G)

Halicioglu Data Science Institute, University of California San Diego, La Jolla, CA, USA.

Donovan H Parks (DH)

Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, The University of Queensland, St Lucia, Queensland, Australia.

Søren M Karst (SM)

Department of Obstetrics and Gynecology, Columbia University, New York, NY, USA.

Mads Albertsen (M)

Department of Chemistry and Bioscience, Aalborg University, Aalborg, Denmark.

Philip Hugenholtz (P)

Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, The University of Queensland, St Lucia, Queensland, Australia.

Todd DeSantis (T)

Department of Informatics, Second Genome, Brisbane, CA, USA.

Se Jin Song (SJ)

Center for Microbiome Innovation, Jacobs School of Engineering, University of California San Diego, La Jolla, CA, USA.

Andrew Bartko (A)

Center for Microbiome Innovation, Jacobs School of Engineering, University of California San Diego, La Jolla, CA, USA.

Aki S Havulinna (AS)

Finnish Institute for Health and Welfare, Helsinki, Finland.
Institute for Molecular Medicine Finland, FIMM-HiLIFE, Helsinki, Finland.

Pekka Jousilahti (P)

Finnish Institute for Health and Welfare, Helsinki, Finland.

Susan Cheng (S)

Division of Cardiology, Brigham and Women's Hospital, Boston, MA, USA.
Cedars-Sinai Medical Center, Los Angeles, CA, USA.

Michael Inouye (M)

Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.
Cambridge Baker Systems Genomics Initiative, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK.

Teemu Niiranen (T)

Finnish Institute for Health and Welfare, Helsinki, Finland.
Division of Medicine, Turku University Hospital and University of Turku, Turku, Finland.

Mohit Jain (M)

Sapient Bioanalytics, LLC, San Diego, CA, USA.

Veikko Salomaa (V)

Finnish Institute for Health and Welfare, Helsinki, Finland.

Leo Lahti (L)

Department of Computing, University of Turku, Turku, Finland.

Siavash Mirarab (S)

Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA, USA.

Rob Knight (R)

Department of Pediatrics, University of California San Diego School of Medicine, La Jolla, CA, USA. robknight@eng.ucsd.edu.
Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA. robknight@eng.ucsd.edu.
Center for Microbiome Innovation, Jacobs School of Engineering, University of California San Diego, La Jolla, CA, USA. robknight@eng.ucsd.edu.
Department of Bioengineering, University of California San Diego, La Jolla, CA, USA. robknight@eng.ucsd.edu.

Classifications MeSH