Benchmarking bacterial taxonomic classification using nanopore metagenomics data of several mock communities.


Journal

Scientific data
ISSN: 2052-4463
Titre abrégé: Sci Data
Pays: England
ID NLM: 101640192

Informations de publication

Date de publication:
10 Aug 2024
Historique:
received: 09 02 2024
accepted: 22 07 2024
medline: 11 8 2024
pubmed: 11 8 2024
entrez: 10 8 2024
Statut: epublish

Résumé

Taxonomic classification is crucial in identifying organisms within diverse microbial communities when using metagenomics shotgun sequencing. While second-generation Illumina sequencing still dominates, third-generation nanopore sequencing promises improved classification through longer reads. However, extensive benchmarking studies on nanopore data are lacking. We systematically evaluated performance of bacterial taxonomic classification for metagenomics nanopore sequencing data for several commonly used classifiers, using standardized reference sequence databases, on the largest collection of publicly available data for defined mock communities thus far (nine samples), representing different research domains and application scopes. Our results categorize classifiers into three categories: low precision/high recall; medium precision/medium recall, and high precision/medium recall. Most fall into the first group, although precision can be improved without excessively penalizing recall with suitable abundance filtering. No definitive 'best' classifier emerges, and classifier selection depends on application scope and practical requirements. Although few classifiers designed for long reads exist, they generally exhibit better performance. Our comprehensive benchmarking provides concrete recommendations, supported by publicly available code for reassessment and fine-tuning by other scientists.

Identifiants

pubmed: 39127718
doi: 10.1038/s41597-024-03672-8
pii: 10.1038/s41597-024-03672-8
doi:

Types de publication

Dataset Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

864

Informations de copyright

© 2024. The Author(s).

Références

Wooley, J. C., Godzik, A. & Friedberg, I. A Primer on Metagenomics. PLoS Comput. Biol. 6, e1000667 (2010).
pubmed: 20195499 pmcid: 2829047 doi: 10.1371/journal.pcbi.1000667
Forbes, J. D., Knox, N. C., Ronholm, J., Pagotto, F. & Reimer, A. Metagenomics: The Next Culture-Independent Game Changer. Front. Microbiol. 8, 1069 (2017).
pubmed: 28725217 pmcid: 5495826 doi: 10.3389/fmicb.2017.01069
New, F. N. & Brito, I. L. What Is Metagenomics Teaching Us, and What Is Missed? Annu. Rev. Microbiol. 74, 117–135 (2020).
pubmed: 32603623 doi: 10.1146/annurev-micro-012520-072314
The Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012).
pmcid: 3564958 doi: 10.1038/nature11234
Hendriksen, R. S. et al. Global monitoring of antimicrobial resistance based on metagenomics analyses of urban sewage. Nat. Commun. 10, 1124 (2019).
pubmed: 30850636 pmcid: 6408512 doi: 10.1038/s41467-019-08853-3
Edge, T. A. et al. The Ecobiomics project: Advancing metagenomics assessment of soil health and freshwater quality in Canada. Sci. Total Environ. 710, 135906 (2020).
pubmed: 31926407 doi: 10.1016/j.scitotenv.2019.135906
Chiu, C. Y. & Miller, S. A. Clinical metagenomics. Nat. Rev. Genet. 20, 341–355 (2019).
pubmed: 30918369 pmcid: 6858796 doi: 10.1038/s41576-019-0113-7
Buytaers, F. E. et al. Application of a strain-level shotgun metagenomics approach on food samples: resolution of the source of a Salmonella food-borne outbreak. Microb. Genomics 7, (2021).
Akaçin, İ., Ersoy, Ş., Doluca, O. & Güngörmüşler, M. Comparing the significance of the utilization of next generation and third generation sequencing technologies in microbial metagenomics. Microbiol. Res. 264, 127154 (2022).
pubmed: 35961096 doi: 10.1016/j.micres.2022.127154
Kraft, F. & Kurth, I. Long-read sequencing to understand genome biology and cell function. Int. J. Biochem. Cell Biol. 126, 105799 (2020).
pubmed: 32629027 doi: 10.1016/j.biocel.2020.105799
Tedersoo, L., Albertsen, M., Anslan, S. & Callahan, B. Perspectives and Benefits of High-Throughput Long-Read Sequencing in Microbial Ecology. Appl. Environ. Microbiol. 87, e00626–21 (2021).
pubmed: 34132589 pmcid: 8357291 doi: 10.1128/AEM.00626-21
Amarasinghe, S. L. et al. Opportunities and challenges in long-read sequencing data analysis. Genome Biol. 21, 30 (2020).
pubmed: 32033565 pmcid: 7006217 doi: 10.1186/s13059-020-1935-5
Cao, M. D. et al. Scaffolding and completing genome assemblies in real-time with nanopore sequencing. Nat. Commun. 8, 14515 (2017).
pubmed: 28218240 pmcid: 5321748 doi: 10.1038/ncomms14515
MacKenzie, M. & Argyropoulos, C. An Introduction to Nanopore Sequencing: Past, Present, and Future Considerations. Micromachines 14, 459 (2023).
pubmed: 36838159 pmcid: 9966803 doi: 10.3390/mi14020459
Gehrig, J. L. et al. Finding the right fit: evaluation of short-read and long-read sequencing approaches to maximize the utility of clinical microbiome data. Microb. Genomics 8, (2022).
Segerman, B. The Most Frequently Used Sequencing Technologies and Assembly Methods in Different Time Segments of the Bacterial Surveillance and RefSeq Genome Databases. Front. Cell. Infect. Microbiol. 10, 527102 (2020).
pubmed: 33194784 pmcid: 7604302 doi: 10.3389/fcimb.2020.527102
Liu, L., Yang, Y., Deng, Y. & Zhang, T. Nanopore long-read-only metagenomics enables complete and high-quality genome reconstruction from mock and complex metagenomes. Microbiome 10, 209 (2022).
pubmed: 36457010 pmcid: 9716684 doi: 10.1186/s40168-022-01415-8
Martin, C. et al. Nanopore-based metagenomics analysis reveals prevalence of mobile antibiotic and heavy metal resistome in wastewater. Ecotoxicology 30, 1572–1585 (2021).
pubmed: 33459951 doi: 10.1007/s10646-020-02342-w
Wongsurawat, T. et al. An assessment of Oxford Nanopore sequencing for human gut metagenome profiling: A pilot study of head and neck cancer patients. J. Microbiol. Methods 166, 105739 (2019).
pubmed: 31626891 pmcid: 6956648 doi: 10.1016/j.mimet.2019.105739
Yang, L. et al. Metagenomic identification of severe pneumonia pathogens in mechanically-ventilated patients: a feasibility and clinical validity study. Respir. Res. 20, 265 (2019).
pubmed: 31775777 pmcid: 6882222 doi: 10.1186/s12931-019-1218-4
Gwak, H.-J., Lee, S. J. & Rho, M. Application of computational approaches to analyze metagenomic data. J. Microbiol. 59, 233–241 (2021).
pubmed: 33565054 doi: 10.1007/s12275-021-0632-8
Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol. 20, 257 (2019).
pubmed: 31779668 pmcid: 6883579 doi: 10.1186/s13059-019-1891-0
Clausen, P. T. L. C., Aarestrup, F. M. & Lund, O. Rapid and precise alignment of raw reads against redundant databases with KMA. BMC Bioinformatics 19, 307 (2018).
pubmed: 30157759 pmcid: 6116485 doi: 10.1186/s12859-018-2336-6
Portik, D. M., Brown, C. T. & Pierce-Ward, N. T. Evaluation of taxonomic profiling methods for long-read shotgun metagenomic sequencing datasets. https://doi.org/10.1101/2022.01.31.478527 (2022).
Truong, D. T. et al. MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat. Methods 12, 902–903 (2015).
pubmed: 26418763 doi: 10.1038/nmeth.3589
Milanese, A. et al. Microbial abundance, activity and population genomic profiling with mOTUs2. Nat. Commun. 10, 1014 (2019).
pubmed: 30833550 pmcid: 6399450 doi: 10.1038/s41467-019-08844-4
Menzel, P., Ng, K. L. & Krogh, A. Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nat. Commun. 7, 11257 (2016).
pubmed: 27071849 pmcid: 4833860 doi: 10.1038/ncomms11257
Ye, S. H., Siddle, K. J., Park, D. J. & Sabeti, P. C. Benchmarking Metagenomics Tools for Taxonomic Classification. Cell 178, 779–794 (2019).
pubmed: 31398336 pmcid: 6716367 doi: 10.1016/j.cell.2019.07.010
Dilthey, A. T., Jain, C., Koren, S. & Phillippy, A. M. Strain-level metagenomic assignment and compositional estimation for long reads with MetaMaps. Nat. Commun. 10, 3066 (2019).
pubmed: 31296857 pmcid: 6624308 doi: 10.1038/s41467-019-10934-2
Huson, D. H. et al. MEGAN-LR: new algorithms allow accurate binning and easy interactive exploration of metagenomic long reads and contigs. Biol. Direct 13, 6 (2018).
pubmed: 29678199 pmcid: 5910613 doi: 10.1186/s13062-018-0208-7
Li, G. et al. Fast and Accurate Classification of Meta-Genomics Long Reads With deSAMBA. Front. Cell Dev. Biol. 9, 643645 (2021).
pubmed: 34012962 pmcid: 8127778 doi: 10.3389/fcell.2021.643645
Eisenhofer, R. & Weyrich, L. S. Assessing alignment-based taxonomic classification of ancient microbial DNA. PeerJ 7, e6594 (2019).
pubmed: 30886779 pmcid: 6420809 doi: 10.7717/peerj.6594
Méric, G., Wick, R. R., Watts, S. C., Holt, K. E. & Inouye, M. Correcting index databases improves metagenomic studies. https://doi.org/10.1101/712166 (2019).
Wright, R. J., Comeau, A. M. & Langille, M. G. I. From defaults to databases: parameter and database choice dramatically impact the performance of metagenomic taxonomic classification tools. Microb. Genomics 9, (2023).
Valencia, E. M., Maki, K. A., Dootz, J. N. & Barb, J. J. Mock community taxonomic classification performance of publicly available shotgun metagenomics pipelines. Sci. Data 11, 81 (2024).
pubmed: 38233447 pmcid: 10794705 doi: 10.1038/s41597-023-02877-7
Breitwieser, F. P., Lu, J. & Salzberg, S. L. A review of methods and databases for metagenomic classification and assembly. Brief. Bioinform. 20, 1125–1136 (2019).
pubmed: 29028872 doi: 10.1093/bib/bbx120
Escobar-Zepeda, A. et al. Analysis of sequencing strategies and tools for taxonomic annotation: Defining standards for progressive metagenomics. Sci. Rep. 8, 12034 (2018).
pubmed: 30104688 pmcid: 6089906 doi: 10.1038/s41598-018-30515-5
Lindgreen, S., Adair, K. L. & Gardner, P. P. An evaluation of the accuracy and speed of metagenome analysis tools. Sci. Rep. 6, 19233 (2016).
pubmed: 26778510 pmcid: 4726098 doi: 10.1038/srep19233
Mande, S. S., Mohammed, M. H. & Ghosh, T. S. Classification of metagenomic sequences: methods and challenges. Brief. Bioinform. 13, 669–681 (2012).
pubmed: 22962338 doi: 10.1093/bib/bbs054
McIntyre, A. B. R. et al. Comprehensive benchmarking and ensemble approaches for metagenomic classifiers. Genome Biol. 18, 182 (2017).
pubmed: 28934964 pmcid: 5609029 doi: 10.1186/s13059-017-1299-7
Parks, D. H. et al. Evaluation of the Microba Community Profiler for Taxonomic Profiling of Metagenomic Datasets From the Human Gut Microbiome. Front. Microbiol. 12, 643682 (2021).
pubmed: 33959106 pmcid: 8093879 doi: 10.3389/fmicb.2021.643682
Tamames, J., Cobo-Simón, M. & Puente-Sánchez, F. Assessing the performance of different approaches for functional and taxonomic annotation of metagenomes. BMC Genomics 20, 960 (2019).
pubmed: 31823721 pmcid: 6902526 doi: 10.1186/s12864-019-6289-6
Meyer, F. et al. Critical Assessment of Metagenome Interpretation - the second round of challenges. https://doi.org/10.1101/2021.07.12.451567 (2021).
Sczyrba, A. et al. Critical Assessment of Metagenome Interpretation—a benchmark of metagenomics software. Nat. Methods 14, 1063–1071 (2017).
pubmed: 28967888 pmcid: 5903868 doi: 10.1038/nmeth.4458
Milhaven, M. & Pfeifer, S. P. Performance evaluation of six popular short-read simulators. Heredity 130, 55–63 (2023).
pubmed: 36496447 doi: 10.1038/s41437-022-00577-3
Highlander, S. Mock Community Analysis. in Encyclopedia of Metagenomics (ed. Nelson, K. E.) 1–7, https://doi.org/10.1007/978-1-4614-6418-1_54-1 (Springer New York, 2014).
Marić, J., Križanović, K., Riondet, S., Nagarajan, N. & Šikić, M. Comparative analysis of metagenomic classifiers for long-read sequencing datasets. BMC Bioinformatics 25, 15 (2024).
pubmed: 38212694 pmcid: 10782538 doi: 10.1186/s12859-024-05634-8
Govender, K. N. & Eyre, D. W. Benchmarking taxonomic classifiers with Illumina and Nanopore sequence data for clinical metagenomic diagnostic applications. Microb. Genomics 8 (2022).
Leidenfrost, R. M., Pöther, D.-C., Jäckel, U. & Wünschiers, R. Benchmarking the MinION: Evaluating long reads for microbial profiling. Sci. Rep. 10, 5125 (2020).
pubmed: 32198413 pmcid: 7083898 doi: 10.1038/s41598-020-61989-x
Nakamura, A. & Komatsu, M. Performance evaluation of whole genome metagenomics sequencing with the MinION nanopore sequencer: Microbial community analysis and antimicrobial resistance gene detection. J. Microbiol. Methods 206, 106688 (2023).
pubmed: 36764487 doi: 10.1016/j.mimet.2023.106688
Pearman, W. S., Freed, N. E. & Silander, O. K. Testing the advantages and disadvantages of short- and long- read eukaryotic metagenomics using simulated reads. BMC Bioinformatics 21, 220 (2020).
pubmed: 32471343 pmcid: 7257156 doi: 10.1186/s12859-020-3528-4
Hall, M. Rasusa: Randomly subsample sequencing reads to a specified coverage. J. Open Source Softw. 7, 3941 (2022).
doi: 10.21105/joss.03941
Fan, J., Huang, S. & Chorlton, S. D. BugSeq: a highly accurate cloud platform for long-read metagenomic analyses. BMC Bioinformatics 22, 160 (2021).
pubmed: 33765910 pmcid: 7993542 doi: 10.1186/s12859-021-04089-5
Bağcı, C., Patz, S. & Huson, D. H. DIAMOND+MEGAN: Fast and Easy Taxonomic and Functional Analysis of Short and Long Microbiome Sequences. Curr. Protoc. 1, e59 (2021).
pubmed: 33656283 doi: 10.1002/cpz1.59
Lu, J., Breitwieser, F. P., Thielen, P. & Salzberg, S. L. Bracken: estimating species abundance in metagenomics data. PeerJ Comput. Sci. 3, e104 (2017).
doi: 10.7717/peerj-cs.104
Kim, D., Song, L., Breitwieser, F. P. & Salzberg, S. L. Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Res. 26, 1721–1729 (2016).
pubmed: 27852649 pmcid: 5131823 doi: 10.1101/gr.210641.116
Mirdita, M., Steinegger, M., Breitwieser, F., Söding, J. & Levy Karin, E. MMSeqs2: Fast and sensitive taxonomic assignment to metagenomic contigs. Bioinformatics 37, 3029–3031 (2021).
pubmed: 33734313 pmcid: 8479651 doi: 10.1093/bioinformatics/btab184
Marcelino, V. R. et al. CCMetagen: comprehensive and accurate identification of eukaryotes and prokaryotes in metagenomic data. Genome Biol. 21, 103 (2020).
pubmed: 32345331 pmcid: 7189439 doi: 10.1186/s13059-020-02014-2
O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).
pubmed: 26553804 doi: 10.1093/nar/gkv1189
Berman, H., Henrick, K. & Nakamura, H. Announcing the worldwide Protein Data Bank. Nat. Struct. Mol. Biol. 10, 980–980 (2003).
doi: 10.1038/nsb1203-980
The UniProt Consortium. et al. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 51, D523–D531 (2023).
doi: 10.1093/nar/gkac1052
Wu, C. H. The Protein Information Resource. Nucleic Acids Res. 31, 345–347 (2003).
pubmed: 12520019 pmcid: 165487 doi: 10.1093/nar/gkg040
Shen, W. & Ren, H. TaxonKit: A practical and efficient NCBI taxonomy toolkit. J. Genet. Genomics 48, 844–850 (2021).
pubmed: 34001434 doi: 10.1016/j.jgg.2021.03.006
Reports of Benchmarking bacterial taxonomic classification using nanopore metagenomics data of several mock communities. Zenodo. https://doi.org/10.5281/zenodo.11371848 (2024).
Hossin, M. & Sulaiman, M. N. A review on evaluation metrics for data classification evaluations. Int. J. Data Min. Knowl. Manag. Process 5, 1 (2015).
doi: 10.5121/ijdkp.2015.5201
Sun, Z. et al. Challenges in benchmarking metagenomic profilers. Nat. Methods 18, 618–626 (2021).
pubmed: 33986544 pmcid: 8184642 doi: 10.1038/s41592-021-01141-3
Peabody, M. A., Van Rossum, T., Lo, R. & Brinkman, F. S. L. Evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities. BMC Bioinformatics 16, 362 (2015).
doi: 10.1186/s12859-015-0788-5
Blanco-Míguez, A. et al. Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4. Nat. Biotechnol. 41, 1633–1644 (2023).
pubmed: 36823356 pmcid: 10635831 doi: 10.1038/s41587-023-01688-w
Akaçin, İ., Ersoy, Ş., Doluca, O. & Güngörmüşler, M. Using custom-built primers and nanopore sequencing to evaluate CO-utilizer bacterial and archaeal populations linked to bioH2 production. Sci. Rep. 13, 17025 (2023).
pubmed: 37813931 pmcid: 10562470 doi: 10.1038/s41598-023-44357-3
Ni, Y., Liu, X., Simeneh, Z. M., Yang, M. & Li, R. Benchmarking of Nanopore R10.4 and R9.4.1 flow cells in single-cell whole-genome amplification and whole-genome shotgun sequencing. Comput. Struct. Biotechnol. J. 21, 2352–2364 (2023).
pubmed: 37025654 pmcid: 10070092 doi: 10.1016/j.csbj.2023.03.038
Nicholls, S. M., Quick, J. C., Tang, S. & Loman, N. J. Ultra-deep, long-read nanopore sequencing of mock microbial community standards. GigaScience 8, giz043 (2019).
pubmed: 31089679 pmcid: 6520541 doi: 10.1093/gigascience/giz043
European Nucleotide Archive. ERR2906227. https://identifiers.org/ena.embl:ERR2906227 (2024).
European Nucleotide Archive. ERR2906229. https://identifiers.org/ena.embl:ERR2906229 (2024).
Sereika, M. et al. Oxford Nanopore R10.4 long-read sequencing enables the generation of near-finished bacterial genomes from pure cultures and metagenomes without short-read or reference polishing. Nat. Methods 19, 823–826 (2022).
pubmed: 35789207 pmcid: 9262707 doi: 10.1038/s41592-022-01539-7
European Nucleotide Archive. ERR7255742. https://identifiers.org/ena.embl:ERR7255742 (2024).
European Nucleotide Archive. ERR7287988. https://identifiers.org/ena.embl:ERR7287988 (2024).
European Nucleotide Archive. SRR17913200. https://identifiers.org/ena.embl:SRR17913200 (2024).
Hu, Y., Fang, L., Nicholson, C. & Wang, K. Implications of Error-Prone Long-Read Whole-Genome Shotgun Sequencing on Characterizing Reference Microbiomes. iScience 23, 101223 (2020).
pubmed: 32563152 pmcid: 7305381 doi: 10.1016/j.isci.2020.101223
European Nucleotide Archive. SRR11700265. https://identifiers.org/ena.embl:SRR11700265 (2024).
European Nucleotide Archive. SRR11700264. https://identifiers.org/ena.embl:SRR11700264 (2024).
Meslier, V. et al. Benchmarking second and third-generation sequencing platforms for microbial metagenomics. Sci. Data 9, 694 (2022).
pubmed: 36369227 pmcid: 9652401 doi: 10.1038/s41597-022-01762-z
European Nucleotide Archive. ERR9765780. https://identifiers.org/ena.embl:ERR9765780 (2024).
European Nucleotide Archive. ERR9765781. https://identifiers.org/ena.embl:ERR9765781 (2024).
European Nucleotide Archive. ERR9765782. https://identifiers.org/ena.embl:ERR9765782 (2024).

Auteurs

Alexander Van Uffelen (A)

Transversal activities in Applied Genomics, Sciensano, Brussels, Belgium.
Department of Information Technology, Internet Technology and Data Science Lab (IDLab), Interuniversity Microelectronics Centre (IMEC), Ghent University, Ghent, Belgium.
Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium.

Andrés Posadas (A)

Transversal activities in Applied Genomics, Sciensano, Brussels, Belgium.
Department of Information Technology, Internet Technology and Data Science Lab (IDLab), Interuniversity Microelectronics Centre (IMEC), Ghent University, Ghent, Belgium.
Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium.

Nancy H C Roosens (NHC)

Transversal activities in Applied Genomics, Sciensano, Brussels, Belgium.

Kathleen Marchal (K)

Department of Information Technology, Internet Technology and Data Science Lab (IDLab), Interuniversity Microelectronics Centre (IMEC), Ghent University, Ghent, Belgium.
Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium.
Department of Genetics, University of Pretoria, Pretoria, South Africa.

Sigrid C J De Keersmaecker (SCJ)

Transversal activities in Applied Genomics, Sciensano, Brussels, Belgium.

Kevin Vanneste (K)

Transversal activities in Applied Genomics, Sciensano, Brussels, Belgium. kevin.vanneste@sciensano.be.

Articles similaires

Populus Soil Microbiology Soil Microbiota Fungi
Aerosols Humans Decontamination Air Microbiology Masks
Coal Metagenome Phylogeny Bacteria Genome, Bacterial
Semiconductors Photosynthesis Polymers Carbon Dioxide Bacteria

Classifications MeSH