Benchmarking bacterial taxonomic classification using nanopore metagenomics data of several mock communities.
Journal
Scientific data
ISSN: 2052-4463
Titre abrégé: Sci Data
Pays: England
ID NLM: 101640192
Informations de publication
Date de publication:
10 Aug 2024
10 Aug 2024
Historique:
received:
09
02
2024
accepted:
22
07
2024
medline:
11
8
2024
pubmed:
11
8
2024
entrez:
10
8
2024
Statut:
epublish
Résumé
Taxonomic classification is crucial in identifying organisms within diverse microbial communities when using metagenomics shotgun sequencing. While second-generation Illumina sequencing still dominates, third-generation nanopore sequencing promises improved classification through longer reads. However, extensive benchmarking studies on nanopore data are lacking. We systematically evaluated performance of bacterial taxonomic classification for metagenomics nanopore sequencing data for several commonly used classifiers, using standardized reference sequence databases, on the largest collection of publicly available data for defined mock communities thus far (nine samples), representing different research domains and application scopes. Our results categorize classifiers into three categories: low precision/high recall; medium precision/medium recall, and high precision/medium recall. Most fall into the first group, although precision can be improved without excessively penalizing recall with suitable abundance filtering. No definitive 'best' classifier emerges, and classifier selection depends on application scope and practical requirements. Although few classifiers designed for long reads exist, they generally exhibit better performance. Our comprehensive benchmarking provides concrete recommendations, supported by publicly available code for reassessment and fine-tuning by other scientists.
Identifiants
pubmed: 39127718
doi: 10.1038/s41597-024-03672-8
pii: 10.1038/s41597-024-03672-8
doi:
Types de publication
Dataset
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
864Informations de copyright
© 2024. The Author(s).
Références
Wooley, J. C., Godzik, A. & Friedberg, I. A Primer on Metagenomics. PLoS Comput. Biol. 6, e1000667 (2010).
pubmed: 20195499
pmcid: 2829047
doi: 10.1371/journal.pcbi.1000667
Forbes, J. D., Knox, N. C., Ronholm, J., Pagotto, F. & Reimer, A. Metagenomics: The Next Culture-Independent Game Changer. Front. Microbiol. 8, 1069 (2017).
pubmed: 28725217
pmcid: 5495826
doi: 10.3389/fmicb.2017.01069
New, F. N. & Brito, I. L. What Is Metagenomics Teaching Us, and What Is Missed? Annu. Rev. Microbiol. 74, 117–135 (2020).
pubmed: 32603623
doi: 10.1146/annurev-micro-012520-072314
The Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012).
pmcid: 3564958
doi: 10.1038/nature11234
Hendriksen, R. S. et al. Global monitoring of antimicrobial resistance based on metagenomics analyses of urban sewage. Nat. Commun. 10, 1124 (2019).
pubmed: 30850636
pmcid: 6408512
doi: 10.1038/s41467-019-08853-3
Edge, T. A. et al. The Ecobiomics project: Advancing metagenomics assessment of soil health and freshwater quality in Canada. Sci. Total Environ. 710, 135906 (2020).
pubmed: 31926407
doi: 10.1016/j.scitotenv.2019.135906
Chiu, C. Y. & Miller, S. A. Clinical metagenomics. Nat. Rev. Genet. 20, 341–355 (2019).
pubmed: 30918369
pmcid: 6858796
doi: 10.1038/s41576-019-0113-7
Buytaers, F. E. et al. Application of a strain-level shotgun metagenomics approach on food samples: resolution of the source of a Salmonella food-borne outbreak. Microb. Genomics 7, (2021).
Akaçin, İ., Ersoy, Ş., Doluca, O. & Güngörmüşler, M. Comparing the significance of the utilization of next generation and third generation sequencing technologies in microbial metagenomics. Microbiol. Res. 264, 127154 (2022).
pubmed: 35961096
doi: 10.1016/j.micres.2022.127154
Kraft, F. & Kurth, I. Long-read sequencing to understand genome biology and cell function. Int. J. Biochem. Cell Biol. 126, 105799 (2020).
pubmed: 32629027
doi: 10.1016/j.biocel.2020.105799
Tedersoo, L., Albertsen, M., Anslan, S. & Callahan, B. Perspectives and Benefits of High-Throughput Long-Read Sequencing in Microbial Ecology. Appl. Environ. Microbiol. 87, e00626–21 (2021).
pubmed: 34132589
pmcid: 8357291
doi: 10.1128/AEM.00626-21
Amarasinghe, S. L. et al. Opportunities and challenges in long-read sequencing data analysis. Genome Biol. 21, 30 (2020).
pubmed: 32033565
pmcid: 7006217
doi: 10.1186/s13059-020-1935-5
Cao, M. D. et al. Scaffolding and completing genome assemblies in real-time with nanopore sequencing. Nat. Commun. 8, 14515 (2017).
pubmed: 28218240
pmcid: 5321748
doi: 10.1038/ncomms14515
MacKenzie, M. & Argyropoulos, C. An Introduction to Nanopore Sequencing: Past, Present, and Future Considerations. Micromachines 14, 459 (2023).
pubmed: 36838159
pmcid: 9966803
doi: 10.3390/mi14020459
Gehrig, J. L. et al. Finding the right fit: evaluation of short-read and long-read sequencing approaches to maximize the utility of clinical microbiome data. Microb. Genomics 8, (2022).
Segerman, B. The Most Frequently Used Sequencing Technologies and Assembly Methods in Different Time Segments of the Bacterial Surveillance and RefSeq Genome Databases. Front. Cell. Infect. Microbiol. 10, 527102 (2020).
pubmed: 33194784
pmcid: 7604302
doi: 10.3389/fcimb.2020.527102
Liu, L., Yang, Y., Deng, Y. & Zhang, T. Nanopore long-read-only metagenomics enables complete and high-quality genome reconstruction from mock and complex metagenomes. Microbiome 10, 209 (2022).
pubmed: 36457010
pmcid: 9716684
doi: 10.1186/s40168-022-01415-8
Martin, C. et al. Nanopore-based metagenomics analysis reveals prevalence of mobile antibiotic and heavy metal resistome in wastewater. Ecotoxicology 30, 1572–1585 (2021).
pubmed: 33459951
doi: 10.1007/s10646-020-02342-w
Wongsurawat, T. et al. An assessment of Oxford Nanopore sequencing for human gut metagenome profiling: A pilot study of head and neck cancer patients. J. Microbiol. Methods 166, 105739 (2019).
pubmed: 31626891
pmcid: 6956648
doi: 10.1016/j.mimet.2019.105739
Yang, L. et al. Metagenomic identification of severe pneumonia pathogens in mechanically-ventilated patients: a feasibility and clinical validity study. Respir. Res. 20, 265 (2019).
pubmed: 31775777
pmcid: 6882222
doi: 10.1186/s12931-019-1218-4
Gwak, H.-J., Lee, S. J. & Rho, M. Application of computational approaches to analyze metagenomic data. J. Microbiol. 59, 233–241 (2021).
pubmed: 33565054
doi: 10.1007/s12275-021-0632-8
Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol. 20, 257 (2019).
pubmed: 31779668
pmcid: 6883579
doi: 10.1186/s13059-019-1891-0
Clausen, P. T. L. C., Aarestrup, F. M. & Lund, O. Rapid and precise alignment of raw reads against redundant databases with KMA. BMC Bioinformatics 19, 307 (2018).
pubmed: 30157759
pmcid: 6116485
doi: 10.1186/s12859-018-2336-6
Portik, D. M., Brown, C. T. & Pierce-Ward, N. T. Evaluation of taxonomic profiling methods for long-read shotgun metagenomic sequencing datasets. https://doi.org/10.1101/2022.01.31.478527 (2022).
Truong, D. T. et al. MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat. Methods 12, 902–903 (2015).
pubmed: 26418763
doi: 10.1038/nmeth.3589
Milanese, A. et al. Microbial abundance, activity and population genomic profiling with mOTUs2. Nat. Commun. 10, 1014 (2019).
pubmed: 30833550
pmcid: 6399450
doi: 10.1038/s41467-019-08844-4
Menzel, P., Ng, K. L. & Krogh, A. Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nat. Commun. 7, 11257 (2016).
pubmed: 27071849
pmcid: 4833860
doi: 10.1038/ncomms11257
Ye, S. H., Siddle, K. J., Park, D. J. & Sabeti, P. C. Benchmarking Metagenomics Tools for Taxonomic Classification. Cell 178, 779–794 (2019).
pubmed: 31398336
pmcid: 6716367
doi: 10.1016/j.cell.2019.07.010
Dilthey, A. T., Jain, C., Koren, S. & Phillippy, A. M. Strain-level metagenomic assignment and compositional estimation for long reads with MetaMaps. Nat. Commun. 10, 3066 (2019).
pubmed: 31296857
pmcid: 6624308
doi: 10.1038/s41467-019-10934-2
Huson, D. H. et al. MEGAN-LR: new algorithms allow accurate binning and easy interactive exploration of metagenomic long reads and contigs. Biol. Direct 13, 6 (2018).
pubmed: 29678199
pmcid: 5910613
doi: 10.1186/s13062-018-0208-7
Li, G. et al. Fast and Accurate Classification of Meta-Genomics Long Reads With deSAMBA. Front. Cell Dev. Biol. 9, 643645 (2021).
pubmed: 34012962
pmcid: 8127778
doi: 10.3389/fcell.2021.643645
Eisenhofer, R. & Weyrich, L. S. Assessing alignment-based taxonomic classification of ancient microbial DNA. PeerJ 7, e6594 (2019).
pubmed: 30886779
pmcid: 6420809
doi: 10.7717/peerj.6594
Méric, G., Wick, R. R., Watts, S. C., Holt, K. E. & Inouye, M. Correcting index databases improves metagenomic studies. https://doi.org/10.1101/712166 (2019).
Wright, R. J., Comeau, A. M. & Langille, M. G. I. From defaults to databases: parameter and database choice dramatically impact the performance of metagenomic taxonomic classification tools. Microb. Genomics 9, (2023).
Valencia, E. M., Maki, K. A., Dootz, J. N. & Barb, J. J. Mock community taxonomic classification performance of publicly available shotgun metagenomics pipelines. Sci. Data 11, 81 (2024).
pubmed: 38233447
pmcid: 10794705
doi: 10.1038/s41597-023-02877-7
Breitwieser, F. P., Lu, J. & Salzberg, S. L. A review of methods and databases for metagenomic classification and assembly. Brief. Bioinform. 20, 1125–1136 (2019).
pubmed: 29028872
doi: 10.1093/bib/bbx120
Escobar-Zepeda, A. et al. Analysis of sequencing strategies and tools for taxonomic annotation: Defining standards for progressive metagenomics. Sci. Rep. 8, 12034 (2018).
pubmed: 30104688
pmcid: 6089906
doi: 10.1038/s41598-018-30515-5
Lindgreen, S., Adair, K. L. & Gardner, P. P. An evaluation of the accuracy and speed of metagenome analysis tools. Sci. Rep. 6, 19233 (2016).
pubmed: 26778510
pmcid: 4726098
doi: 10.1038/srep19233
Mande, S. S., Mohammed, M. H. & Ghosh, T. S. Classification of metagenomic sequences: methods and challenges. Brief. Bioinform. 13, 669–681 (2012).
pubmed: 22962338
doi: 10.1093/bib/bbs054
McIntyre, A. B. R. et al. Comprehensive benchmarking and ensemble approaches for metagenomic classifiers. Genome Biol. 18, 182 (2017).
pubmed: 28934964
pmcid: 5609029
doi: 10.1186/s13059-017-1299-7
Parks, D. H. et al. Evaluation of the Microba Community Profiler for Taxonomic Profiling of Metagenomic Datasets From the Human Gut Microbiome. Front. Microbiol. 12, 643682 (2021).
pubmed: 33959106
pmcid: 8093879
doi: 10.3389/fmicb.2021.643682
Tamames, J., Cobo-Simón, M. & Puente-Sánchez, F. Assessing the performance of different approaches for functional and taxonomic annotation of metagenomes. BMC Genomics 20, 960 (2019).
pubmed: 31823721
pmcid: 6902526
doi: 10.1186/s12864-019-6289-6
Meyer, F. et al. Critical Assessment of Metagenome Interpretation - the second round of challenges. https://doi.org/10.1101/2021.07.12.451567 (2021).
Sczyrba, A. et al. Critical Assessment of Metagenome Interpretation—a benchmark of metagenomics software. Nat. Methods 14, 1063–1071 (2017).
pubmed: 28967888
pmcid: 5903868
doi: 10.1038/nmeth.4458
Milhaven, M. & Pfeifer, S. P. Performance evaluation of six popular short-read simulators. Heredity 130, 55–63 (2023).
pubmed: 36496447
doi: 10.1038/s41437-022-00577-3
Highlander, S. Mock Community Analysis. in Encyclopedia of Metagenomics (ed. Nelson, K. E.) 1–7, https://doi.org/10.1007/978-1-4614-6418-1_54-1 (Springer New York, 2014).
Marić, J., Križanović, K., Riondet, S., Nagarajan, N. & Šikić, M. Comparative analysis of metagenomic classifiers for long-read sequencing datasets. BMC Bioinformatics 25, 15 (2024).
pubmed: 38212694
pmcid: 10782538
doi: 10.1186/s12859-024-05634-8
Govender, K. N. & Eyre, D. W. Benchmarking taxonomic classifiers with Illumina and Nanopore sequence data for clinical metagenomic diagnostic applications. Microb. Genomics 8 (2022).
Leidenfrost, R. M., Pöther, D.-C., Jäckel, U. & Wünschiers, R. Benchmarking the MinION: Evaluating long reads for microbial profiling. Sci. Rep. 10, 5125 (2020).
pubmed: 32198413
pmcid: 7083898
doi: 10.1038/s41598-020-61989-x
Nakamura, A. & Komatsu, M. Performance evaluation of whole genome metagenomics sequencing with the MinION nanopore sequencer: Microbial community analysis and antimicrobial resistance gene detection. J. Microbiol. Methods 206, 106688 (2023).
pubmed: 36764487
doi: 10.1016/j.mimet.2023.106688
Pearman, W. S., Freed, N. E. & Silander, O. K. Testing the advantages and disadvantages of short- and long- read eukaryotic metagenomics using simulated reads. BMC Bioinformatics 21, 220 (2020).
pubmed: 32471343
pmcid: 7257156
doi: 10.1186/s12859-020-3528-4
Hall, M. Rasusa: Randomly subsample sequencing reads to a specified coverage. J. Open Source Softw. 7, 3941 (2022).
doi: 10.21105/joss.03941
Fan, J., Huang, S. & Chorlton, S. D. BugSeq: a highly accurate cloud platform for long-read metagenomic analyses. BMC Bioinformatics 22, 160 (2021).
pubmed: 33765910
pmcid: 7993542
doi: 10.1186/s12859-021-04089-5
Bağcı, C., Patz, S. & Huson, D. H. DIAMOND+MEGAN: Fast and Easy Taxonomic and Functional Analysis of Short and Long Microbiome Sequences. Curr. Protoc. 1, e59 (2021).
pubmed: 33656283
doi: 10.1002/cpz1.59
Lu, J., Breitwieser, F. P., Thielen, P. & Salzberg, S. L. Bracken: estimating species abundance in metagenomics data. PeerJ Comput. Sci. 3, e104 (2017).
doi: 10.7717/peerj-cs.104
Kim, D., Song, L., Breitwieser, F. P. & Salzberg, S. L. Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Res. 26, 1721–1729 (2016).
pubmed: 27852649
pmcid: 5131823
doi: 10.1101/gr.210641.116
Mirdita, M., Steinegger, M., Breitwieser, F., Söding, J. & Levy Karin, E. MMSeqs2: Fast and sensitive taxonomic assignment to metagenomic contigs. Bioinformatics 37, 3029–3031 (2021).
pubmed: 33734313
pmcid: 8479651
doi: 10.1093/bioinformatics/btab184
Marcelino, V. R. et al. CCMetagen: comprehensive and accurate identification of eukaryotes and prokaryotes in metagenomic data. Genome Biol. 21, 103 (2020).
pubmed: 32345331
pmcid: 7189439
doi: 10.1186/s13059-020-02014-2
O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).
pubmed: 26553804
doi: 10.1093/nar/gkv1189
Berman, H., Henrick, K. & Nakamura, H. Announcing the worldwide Protein Data Bank. Nat. Struct. Mol. Biol. 10, 980–980 (2003).
doi: 10.1038/nsb1203-980
The UniProt Consortium. et al. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 51, D523–D531 (2023).
doi: 10.1093/nar/gkac1052
Wu, C. H. The Protein Information Resource. Nucleic Acids Res. 31, 345–347 (2003).
pubmed: 12520019
pmcid: 165487
doi: 10.1093/nar/gkg040
Shen, W. & Ren, H. TaxonKit: A practical and efficient NCBI taxonomy toolkit. J. Genet. Genomics 48, 844–850 (2021).
pubmed: 34001434
doi: 10.1016/j.jgg.2021.03.006
Reports of Benchmarking bacterial taxonomic classification using nanopore metagenomics data of several mock communities. Zenodo. https://doi.org/10.5281/zenodo.11371848 (2024).
Hossin, M. & Sulaiman, M. N. A review on evaluation metrics for data classification evaluations. Int. J. Data Min. Knowl. Manag. Process 5, 1 (2015).
doi: 10.5121/ijdkp.2015.5201
Sun, Z. et al. Challenges in benchmarking metagenomic profilers. Nat. Methods 18, 618–626 (2021).
pubmed: 33986544
pmcid: 8184642
doi: 10.1038/s41592-021-01141-3
Peabody, M. A., Van Rossum, T., Lo, R. & Brinkman, F. S. L. Evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities. BMC Bioinformatics 16, 362 (2015).
doi: 10.1186/s12859-015-0788-5
Blanco-Míguez, A. et al. Extending and improving metagenomic taxonomic profiling with uncharacterized species using MetaPhlAn 4. Nat. Biotechnol. 41, 1633–1644 (2023).
pubmed: 36823356
pmcid: 10635831
doi: 10.1038/s41587-023-01688-w
Akaçin, İ., Ersoy, Ş., Doluca, O. & Güngörmüşler, M. Using custom-built primers and nanopore sequencing to evaluate CO-utilizer bacterial and archaeal populations linked to bioH2 production. Sci. Rep. 13, 17025 (2023).
pubmed: 37813931
pmcid: 10562470
doi: 10.1038/s41598-023-44357-3
Ni, Y., Liu, X., Simeneh, Z. M., Yang, M. & Li, R. Benchmarking of Nanopore R10.4 and R9.4.1 flow cells in single-cell whole-genome amplification and whole-genome shotgun sequencing. Comput. Struct. Biotechnol. J. 21, 2352–2364 (2023).
pubmed: 37025654
pmcid: 10070092
doi: 10.1016/j.csbj.2023.03.038
Nicholls, S. M., Quick, J. C., Tang, S. & Loman, N. J. Ultra-deep, long-read nanopore sequencing of mock microbial community standards. GigaScience 8, giz043 (2019).
pubmed: 31089679
pmcid: 6520541
doi: 10.1093/gigascience/giz043
European Nucleotide Archive. ERR2906227. https://identifiers.org/ena.embl:ERR2906227 (2024).
European Nucleotide Archive. ERR2906229. https://identifiers.org/ena.embl:ERR2906229 (2024).
Sereika, M. et al. Oxford Nanopore R10.4 long-read sequencing enables the generation of near-finished bacterial genomes from pure cultures and metagenomes without short-read or reference polishing. Nat. Methods 19, 823–826 (2022).
pubmed: 35789207
pmcid: 9262707
doi: 10.1038/s41592-022-01539-7
European Nucleotide Archive. ERR7255742. https://identifiers.org/ena.embl:ERR7255742 (2024).
European Nucleotide Archive. ERR7287988. https://identifiers.org/ena.embl:ERR7287988 (2024).
European Nucleotide Archive. SRR17913200. https://identifiers.org/ena.embl:SRR17913200 (2024).
Hu, Y., Fang, L., Nicholson, C. & Wang, K. Implications of Error-Prone Long-Read Whole-Genome Shotgun Sequencing on Characterizing Reference Microbiomes. iScience 23, 101223 (2020).
pubmed: 32563152
pmcid: 7305381
doi: 10.1016/j.isci.2020.101223
European Nucleotide Archive. SRR11700265. https://identifiers.org/ena.embl:SRR11700265 (2024).
European Nucleotide Archive. SRR11700264. https://identifiers.org/ena.embl:SRR11700264 (2024).
Meslier, V. et al. Benchmarking second and third-generation sequencing platforms for microbial metagenomics. Sci. Data 9, 694 (2022).
pubmed: 36369227
pmcid: 9652401
doi: 10.1038/s41597-022-01762-z
European Nucleotide Archive. ERR9765780. https://identifiers.org/ena.embl:ERR9765780 (2024).
European Nucleotide Archive. ERR9765781. https://identifiers.org/ena.embl:ERR9765781 (2024).
European Nucleotide Archive. ERR9765782. https://identifiers.org/ena.embl:ERR9765782 (2024).