Accessible viral metagenomics for public health and clinical domains with Jovian.


Journal

Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288

Informations de publication

Date de publication:
29 10 2024
Historique:
received: 25 03 2024
accepted: 20 09 2024
medline: 30 10 2024
pubmed: 30 10 2024
entrez: 30 10 2024
Statut: epublish

Résumé

The integration of next-generation sequencing into clinical diagnostics and surveillance initiatives is impeded by the lack of data analysis pipelines that align with privacy legislation and laboratory certification protocols. To address these challenges, we developed Jovian, an open-source, virus-focused, metagenomic analysis workflow for Illumina data. Jovian generates scaffolds enriched with pertinent annotations, including taxonomic classification, combined with metrics needed for quality assessment (coverage depth, average GC content, localization of open reading frames, minority single nucleotide polymorphisms), and incorporates host and disease metadata. Interactive web-based reports with an audit trail are generated. Jovian was employed on four systems, hosted by three institutes, utilizing grid-computers, a high-performance compute singular server, and a Windows10 laptop. All systems yielded identical results with matching MD5sums. Comparison with a commercial online reference tool using viral gastroenteritis samples confirmed the identification of the same pathogens. Jovian provides comparable results to a commercially available online reference tool and generates identical results at different institutes with different IT architectures, proving it is portable and reproducible. Jovian addresses bottlenecks in the deployment of metagenomics within public health and clinical laboratories and has the potential to enhance the breadth of surveillance and testing programs, thereby fostering more effective public health interventions.

Identifiants

pubmed: 39472593
doi: 10.1038/s41598-024-73785-y
pii: 10.1038/s41598-024-73785-y
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

26018

Subventions

Organisme : European Union's Horizon H2020 grants COMPARE
ID : 643476
Organisme : European Union's Horizon H2020 grants VEO
ID : 874735

Informations de copyright

© 2024. The Author(s).

Références

Munnink, B. B. O. et al. Rapid SARS-CoV-2 whole-genome sequencing and analysis for informed public health decision-making in the Netherlands. Nat. Med. 26, 1405–1411 (2020).
doi: 10.1038/s41591-020-0997-y
Houldcroft, C. J., Beale, M. A. & Breuer, J. Clinical and biological insights from viral genome sequencing. Nat. Rev. Microbiol. 15, 183–192 (2017).
pubmed: 28090077 pmcid: 7097211 doi: 10.1038/nrmicro.2016.182
Grubaugh, N. D. et al. Tracking virus outbreaks in the twenty-first century. Nat. Microbiol. 4, 10–19 (2019).
pubmed: 30546099 doi: 10.1038/s41564-018-0296-2
Francis, R. V. et al. The impact of real-time whole-genome sequencing in controlling healthcare-associated SARS-CoV-2 outbreaks. J. Infect. Dis. 225, 10–18 (2022).
pubmed: 34555152 doi: 10.1093/infdis/jiab483
Chiu, C. Y. & Miller, S. A. Clinical metagenomics. Nat. Rev. Genet. 20, 341–355 (2019).
pubmed: 30918369 pmcid: 6858796 doi: 10.1038/s41576-019-0113-7
Vilsker, M. et al. Genome detective: an automated system for virus identification from high-throughput sequencing data. Bioinformatics 35, 871–873 (2019).
pubmed: 30124794 doi: 10.1093/bioinformatics/bty695
Minot, S. S., Krumm, N. & Greenfield, N. B. One codex: a sensitive and accurate data platform for genomic microbial identification. BioRxiv 027607 (2015).
Flygare, S. et al. Taxonomer: an interactive metagenomics analysis portal for universal pathogen detection and host mRNA expression profiling. Genome Biol. 17, 1–18 (2016).
doi: 10.1186/s13059-016-0969-1
Taxonomer. Taxonomer Page. https://taxonomer.iobio.io/ (Accessed 02 November 2023).
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
pubmed: 9254694 pmcid: 146917 doi: 10.1093/nar/25.17.3389
Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol. 20, 1–13 (2019).
doi: 10.1186/s13059-019-1891-0
Kim, D., Song, L., Breitwieser, F. P. & Salzberg, S. L. Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Res. 26, 1721–1729 (2016).
pubmed: 27852649 pmcid: 5131823 doi: 10.1101/gr.210641.116
Cadenas-Castrejón, E., Verleyen, J., Boukadida, C., Díaz-González, L. & Taboada, B. Evaluation of tools for taxonomic classification of viruses. Brief. Funct. Genom. 22, 31–41 (2023).
doi: 10.1093/bfgp/elac036
Kroneman, A. et al. Proposal for a unified norovirus nomenclature and genotyping. Arch. Virol. 158, 2059–2068 (2013).
pubmed: 23615870 pmcid: 5570552 doi: 10.1007/s00705-013-1708-5
Kroneman, A. et al. An automated genotyping tool for enteroviruses and noroviruses. J. Clin. Virol. 51, 121–125 (2011).
pubmed: 21514213 doi: 10.1016/j.jcv.2011.03.006
Kroneman, A., de Sousa, R., Verhoef, L., Koopmans, M. P. & Vennema, H. Usability of the international HAVNet hepatitis a virus database for geographical annotation, backtracing and outbreak detection. Eurosurveillance 23, 1700802 (2018).
pubmed: 30229723 pmcid: 6144472 doi: 10.2807/1560-7917.ES.2018.23.37.1700802
Mulder, A. C. et al. HEVnet: a one health, collaborative, interdisciplinary network and sequence data repository for enhanced hepatitis E virus molecular typing, characterisation and epidemiological investigations. Eurosurveillance 24, 1800407 (2019).
pubmed: 30862334 pmcid: 6415499 doi: 10.2807/1560-7917.ES.2019.24.10.1800407
Wilkinson, M. D. et al. The FAIR guiding principles for scientific data management and stewardship. Sci. Data 3, 1–9 (2016).
doi: 10.1038/sdata.2016.18
Jansen, S. A. et al. Broad virus detection and variant discovery in fecal samples of hematopoietic transplant recipients using targeted sequence capture metagenomics. Front. Microbiol. 11, 560179 (2020).
pubmed: 33281758 pmcid: 7705093 doi: 10.3389/fmicb.2020.560179
Carbo, E. C. et al. Improved diagnosis of viral encephalitis in adult and pediatric hematological patients using viral metagenomics. J. Clin. Virol. 130, 104566 (2020).
pubmed: 32823257 doi: 10.1016/j.jcv.2020.104566
Köster, J. & Rahmann, S. Snakemake—a scalable bioinformatics workflow engine. Bioinformatics 28, 2520–2522 (2012).
pubmed: 22908215 doi: 10.1093/bioinformatics/bts480
Grüning, B. et al. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat. Methods 15, 475–476 (2018).
pubmed: 29967506 pmcid: 11070151 doi: 10.1038/s41592-018-0046-7
Kurtzer, G. M., Sochat, V. & Bauer, M. W. Singularity: scientific containers for mobility of compute. PLoS ONE 12, e0177459 (2017).
pubmed: 28494014 pmcid: 5426675 doi: 10.1371/journal.pone.0177459
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
pubmed: 24695404 pmcid: 4103590 doi: 10.1093/bioinformatics/btu170
Andrews, S. FastQC: A Quality Control Tool for High Throughput Sequence Data (2010).
Ewels, P., Magnusson, M., Lundin, S. & Käller, M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32, 3047–3048 (2016).
pubmed: 27312411 pmcid: 5039924 doi: 10.1093/bioinformatics/btw354
Schneider, V. A. et al. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Res. 27, 849–864 (2017).
pubmed: 28396521 pmcid: 5411779 doi: 10.1101/gr.213611.116
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
pubmed: 22388286 pmcid: 3322381 doi: 10.1038/nmeth.1923
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
pubmed: 19505943 pmcid: 2723002 doi: 10.1093/bioinformatics/btp352
Broad Institute. Broad Institute GitHub Page for Picard. https://broadinstitute.github.io/picard/ (Accessed 06 October 2023).
Nurk, S., Meleshko, D., Korobeynikov, A. & Pevzner, P. A. metaSPAdes: a new versatile metagenomic assembler. Genome Res. 27, 824–834 (2017).
pubmed: 28298430 pmcid: 5411777 doi: 10.1101/gr.213959.116
Wheeler, D. L. et al. Database resources of the national center for biotechnology information. Nucleic Acids Res. 36, D13–D21 (2007).
pubmed: 18045790 pmcid: 2238880 doi: 10.1093/nar/gkm1000
Rubino, F. & Creevey, C. MGkit Metagenomic framework for the study of microbial communities. Figshare Poste (2014).
NCBI. NCBI FTP for new_taxdump. https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/new_taxdump/ (Accessed 06 October 2023).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
pubmed: 19451168 pmcid: 2705234 doi: 10.1093/bioinformatics/btp324
Bushnell, B. BBTools Software Package (2014).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
pubmed: 20110278 pmcid: 2832824 doi: 10.1093/bioinformatics/btq033
Wilm, A. et al. LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets. Nucleic Acids Res. 40, 11189–11201 (2012).
pubmed: 23066108 pmcid: 3526318 doi: 10.1093/nar/gks918
Mihara, T. et al. Linking virus genomes with host taxonomy. Viruses 8, 66 (2016).
pubmed: 26938550 pmcid: 4810256 doi: 10.3390/v8030066
Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinform. 11, 1–11 (2010).
doi: 10.1186/1471-2105-11-119
Quantopian. Quantopian GitHub Page. https://github.com/quantopian/qgrid (Accessed 06 October 2023).
Bokeh. Bokeh Homepage. https://bokeh.pydata.org/en/latest/ (Accessed 06 October 2023).
Ondov, B. D., Bergman, N. H. & Phillippy, A. M. Interactive metagenomic visualization in a web browser. BMC Bioinform. 12, 1–10 (2011).
doi: 10.1186/1471-2105-12-385
Robinson, J. T., Thorvaldsdóttir, H., Turner, D. & Mesirov, J. P. igv.js: an embeddable JavaScript implementation of the integrative genomics viewer (IGV). Bioinformatics 39, 830 (2023).
doi: 10.1093/bioinformatics/btac830
Kluyver, T. et al. Positioning and Power in Academic Publishing: Players, Agents and Agendas 87–90 (IOS, 2016).
Ragan-Kelley, B. et al. Proceedings of the 17th Python in Science Conference 113–120 (eds. Akici, F.).
IBM. IBM Spectrum LSF Suites Homepage. https://www.ibm.com/products/hpc-workload-management (Accessed 06 October 2023).
Sched, M. D. Slurm Homepage. https://www.schedmd.com/ (Accessed 06 October 2023).
Schmitz, D. et al. Metagenomic surveillance of viral gastroenteritis in a public health setting. Microbiol. Spectr. 11, e05022 (2023).
pubmed: 37432120 pmcid: 10434279 doi: 10.1128/spectrum.05022-22
Rivest, R. (Editor RFC, 1992).
Fitzpatrick, P. daff GitHub Page. https://github.com/paulfitz/daff (Accessed 06 October 2023).
de Vries, J. J. et al. Recommendations for the introduction of metagenomic next-generation sequencing in clinical virology, part II: bioinformatic analysis and reporting. J. Clin. Virol. 138, 104812 (2021).
pubmed: 33819811 doi: 10.1016/j.jcv.2021.104812
Brinkmann, A. et al. Proficiency testing of virus diagnostics based on bioinformatics analysis of simulated in silico high-throughput sequencing data sets. J. Clin. Microbiol. 57, 419. https://doi.org/10.1128/jcm.00466-00419 (2019).
doi: 10.1128/jcm.00466-00419
de Vries, J. J. et al. Benchmark of thirteen bioinformatic pipelines for metagenomic virus diagnostics using datasets from clinical samples. J. Clin. Virol. 141, 104908 (2021).
pubmed: 34273858 pmcid: 7615111 doi: 10.1016/j.jcv.2021.104908
Nieroda, L. et al. iRODS metadata management for a cancer genome analysis workflow. BMC Bioinform. 20, 1–8 (2019).
doi: 10.1186/s12859-018-2576-5
Salter, S. J. et al. Reagent and laboratory contamination can critically impact sequence-based microbiome analyses. BMC Biol. 12, 1–12 (2014).
doi: 10.1186/s12915-014-0087-z
Kulakov, L. A., McAlister, M. B., Ogden, K. L., Larkin, M. J. & O’Hanlon, J. F. Analysis of bacteria contaminating ultrapure water in industrial systems. Appl. Environ. Microbiol. 68, 1548–1555 (2002).
pubmed: 11916667 pmcid: 123900 doi: 10.1128/AEM.68.4.1548-1555.2002
iRODS. iRODS Homepage. https://irods.org/ (Accessed 06 October 2023).
Bodewes, R. et al. Molecular epidemiology of mumps viruses in the Netherlands, 2017–2019. PLoS ONE 15, e0233143 (2020).
pubmed: 32925979 pmcid: 7489541 doi: 10.1371/journal.pone.0233143
Benschop, K. S. et al. Molecular epidemiology and evolutionary trajectory of emerging Echovirus 30, Europe. Emerg. Infect. Dis. 27, 1616 (2021).
pubmed: 34013874 pmcid: 8153861 doi: 10.3201/eid2706.203096
Amid, C. et al. The COMPARE data hubs. Database 2019, 136 (2019).
RIVM Bioinformatics Team. Juno GitHub Page. https://github.com/RIVM-bioinformatics/juno-assembly (Accessed 06 October 2023).

Auteurs

Dennis Schmitz (D)

National Institute of Public Health and the Environment, Center for Infectious Disease Control, 3720BA, Bilthoven, The Netherlands. Dennis.Schmitz@rivm.nl.
Viroscience, Erasmus University Medical Center, 3015GB, Rotterdam, The Netherlands. Dennis.Schmitz@rivm.nl.

Florian Zwagemaker (F)

National Institute of Public Health and the Environment, Center for Infectious Disease Control, 3720BA, Bilthoven, The Netherlands.

Sam Nooij (S)

National Institute of Public Health and the Environment, Center for Infectious Disease Control, 3720BA, Bilthoven, The Netherlands.
Center for Infectious Diseases, Leiden University Medical Center, 2333ZA, Leiden, The Netherlands.

Thierry K S Janssens (TKS)

National Institute of Public Health and the Environment, Center for Infectious Disease Control, 3720BA, Bilthoven, The Netherlands.

Jeroen Cremer (J)

National Institute of Public Health and the Environment, Center for Infectious Disease Control, 3720BA, Bilthoven, The Netherlands.

Robert Verhagen (R)

National Institute of Public Health and the Environment, Center for Infectious Disease Control, 3720BA, Bilthoven, The Netherlands.

Harry Vennema (H)

National Institute of Public Health and the Environment, Center for Infectious Disease Control, 3720BA, Bilthoven, The Netherlands.

Annelies Kroneman (A)

National Institute of Public Health and the Environment, Center for Infectious Disease Control, 3720BA, Bilthoven, The Netherlands.

Marion P G Koopmans (MPG)

Viroscience, Erasmus University Medical Center, 3015GB, Rotterdam, The Netherlands.

Jeroen F J Laros (JFJ)

Department of Bio-Informatics and Computational Services, National Institute of Public Health and the Environment, 3720BA, Bilthoven, The Netherlands.
Department of Human Genetics, Leiden University Medical Center, 2333ZA, Leiden, The Netherlands.

Miranda de Graaf (M)

Viroscience, Erasmus University Medical Center, 3015GB, Rotterdam, The Netherlands.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C

Classifications MeSH