Accessible viral metagenomics for public health and clinical domains with Jovian.
Clinics
Diagnostics
Next generation sequencing
Public health
Surveillance
Viromics
Journal
Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288
Informations de publication
Date de publication:
29 10 2024
29 10 2024
Historique:
received:
25
03
2024
accepted:
20
09
2024
medline:
30
10
2024
pubmed:
30
10
2024
entrez:
30
10
2024
Statut:
epublish
Résumé
The integration of next-generation sequencing into clinical diagnostics and surveillance initiatives is impeded by the lack of data analysis pipelines that align with privacy legislation and laboratory certification protocols. To address these challenges, we developed Jovian, an open-source, virus-focused, metagenomic analysis workflow for Illumina data. Jovian generates scaffolds enriched with pertinent annotations, including taxonomic classification, combined with metrics needed for quality assessment (coverage depth, average GC content, localization of open reading frames, minority single nucleotide polymorphisms), and incorporates host and disease metadata. Interactive web-based reports with an audit trail are generated. Jovian was employed on four systems, hosted by three institutes, utilizing grid-computers, a high-performance compute singular server, and a Windows10 laptop. All systems yielded identical results with matching MD5sums. Comparison with a commercial online reference tool using viral gastroenteritis samples confirmed the identification of the same pathogens. Jovian provides comparable results to a commercially available online reference tool and generates identical results at different institutes with different IT architectures, proving it is portable and reproducible. Jovian addresses bottlenecks in the deployment of metagenomics within public health and clinical laboratories and has the potential to enhance the breadth of surveillance and testing programs, thereby fostering more effective public health interventions.
Identifiants
pubmed: 39472593
doi: 10.1038/s41598-024-73785-y
pii: 10.1038/s41598-024-73785-y
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
26018Subventions
Organisme : European Union's Horizon H2020 grants COMPARE
ID : 643476
Organisme : European Union's Horizon H2020 grants VEO
ID : 874735
Informations de copyright
© 2024. The Author(s).
Références
Munnink, B. B. O. et al. Rapid SARS-CoV-2 whole-genome sequencing and analysis for informed public health decision-making in the Netherlands. Nat. Med. 26, 1405–1411 (2020).
doi: 10.1038/s41591-020-0997-y
Houldcroft, C. J., Beale, M. A. & Breuer, J. Clinical and biological insights from viral genome sequencing. Nat. Rev. Microbiol. 15, 183–192 (2017).
pubmed: 28090077
pmcid: 7097211
doi: 10.1038/nrmicro.2016.182
Grubaugh, N. D. et al. Tracking virus outbreaks in the twenty-first century. Nat. Microbiol. 4, 10–19 (2019).
pubmed: 30546099
doi: 10.1038/s41564-018-0296-2
Francis, R. V. et al. The impact of real-time whole-genome sequencing in controlling healthcare-associated SARS-CoV-2 outbreaks. J. Infect. Dis. 225, 10–18 (2022).
pubmed: 34555152
doi: 10.1093/infdis/jiab483
Chiu, C. Y. & Miller, S. A. Clinical metagenomics. Nat. Rev. Genet. 20, 341–355 (2019).
pubmed: 30918369
pmcid: 6858796
doi: 10.1038/s41576-019-0113-7
Vilsker, M. et al. Genome detective: an automated system for virus identification from high-throughput sequencing data. Bioinformatics 35, 871–873 (2019).
pubmed: 30124794
doi: 10.1093/bioinformatics/bty695
Minot, S. S., Krumm, N. & Greenfield, N. B. One codex: a sensitive and accurate data platform for genomic microbial identification. BioRxiv 027607 (2015).
Flygare, S. et al. Taxonomer: an interactive metagenomics analysis portal for universal pathogen detection and host mRNA expression profiling. Genome Biol. 17, 1–18 (2016).
doi: 10.1186/s13059-016-0969-1
Taxonomer. Taxonomer Page. https://taxonomer.iobio.io/ (Accessed 02 November 2023).
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
pubmed: 9254694
pmcid: 146917
doi: 10.1093/nar/25.17.3389
Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol. 20, 1–13 (2019).
doi: 10.1186/s13059-019-1891-0
Kim, D., Song, L., Breitwieser, F. P. & Salzberg, S. L. Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Res. 26, 1721–1729 (2016).
pubmed: 27852649
pmcid: 5131823
doi: 10.1101/gr.210641.116
Cadenas-Castrejón, E., Verleyen, J., Boukadida, C., Díaz-González, L. & Taboada, B. Evaluation of tools for taxonomic classification of viruses. Brief. Funct. Genom. 22, 31–41 (2023).
doi: 10.1093/bfgp/elac036
Kroneman, A. et al. Proposal for a unified norovirus nomenclature and genotyping. Arch. Virol. 158, 2059–2068 (2013).
pubmed: 23615870
pmcid: 5570552
doi: 10.1007/s00705-013-1708-5
Kroneman, A. et al. An automated genotyping tool for enteroviruses and noroviruses. J. Clin. Virol. 51, 121–125 (2011).
pubmed: 21514213
doi: 10.1016/j.jcv.2011.03.006
Kroneman, A., de Sousa, R., Verhoef, L., Koopmans, M. P. & Vennema, H. Usability of the international HAVNet hepatitis a virus database for geographical annotation, backtracing and outbreak detection. Eurosurveillance 23, 1700802 (2018).
pubmed: 30229723
pmcid: 6144472
doi: 10.2807/1560-7917.ES.2018.23.37.1700802
Mulder, A. C. et al. HEVnet: a one health, collaborative, interdisciplinary network and sequence data repository for enhanced hepatitis E virus molecular typing, characterisation and epidemiological investigations. Eurosurveillance 24, 1800407 (2019).
pubmed: 30862334
pmcid: 6415499
doi: 10.2807/1560-7917.ES.2019.24.10.1800407
Wilkinson, M. D. et al. The FAIR guiding principles for scientific data management and stewardship. Sci. Data 3, 1–9 (2016).
doi: 10.1038/sdata.2016.18
Jansen, S. A. et al. Broad virus detection and variant discovery in fecal samples of hematopoietic transplant recipients using targeted sequence capture metagenomics. Front. Microbiol. 11, 560179 (2020).
pubmed: 33281758
pmcid: 7705093
doi: 10.3389/fmicb.2020.560179
Carbo, E. C. et al. Improved diagnosis of viral encephalitis in adult and pediatric hematological patients using viral metagenomics. J. Clin. Virol. 130, 104566 (2020).
pubmed: 32823257
doi: 10.1016/j.jcv.2020.104566
Köster, J. & Rahmann, S. Snakemake—a scalable bioinformatics workflow engine. Bioinformatics 28, 2520–2522 (2012).
pubmed: 22908215
doi: 10.1093/bioinformatics/bts480
Grüning, B. et al. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat. Methods 15, 475–476 (2018).
pubmed: 29967506
pmcid: 11070151
doi: 10.1038/s41592-018-0046-7
Kurtzer, G. M., Sochat, V. & Bauer, M. W. Singularity: scientific containers for mobility of compute. PLoS ONE 12, e0177459 (2017).
pubmed: 28494014
pmcid: 5426675
doi: 10.1371/journal.pone.0177459
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
pubmed: 24695404
pmcid: 4103590
doi: 10.1093/bioinformatics/btu170
Andrews, S. FastQC: A Quality Control Tool for High Throughput Sequence Data (2010).
Ewels, P., Magnusson, M., Lundin, S. & Käller, M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32, 3047–3048 (2016).
pubmed: 27312411
pmcid: 5039924
doi: 10.1093/bioinformatics/btw354
Schneider, V. A. et al. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Res. 27, 849–864 (2017).
pubmed: 28396521
pmcid: 5411779
doi: 10.1101/gr.213611.116
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
pubmed: 22388286
pmcid: 3322381
doi: 10.1038/nmeth.1923
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
pubmed: 19505943
pmcid: 2723002
doi: 10.1093/bioinformatics/btp352
Broad Institute. Broad Institute GitHub Page for Picard. https://broadinstitute.github.io/picard/ (Accessed 06 October 2023).
Nurk, S., Meleshko, D., Korobeynikov, A. & Pevzner, P. A. metaSPAdes: a new versatile metagenomic assembler. Genome Res. 27, 824–834 (2017).
pubmed: 28298430
pmcid: 5411777
doi: 10.1101/gr.213959.116
Wheeler, D. L. et al. Database resources of the national center for biotechnology information. Nucleic Acids Res. 36, D13–D21 (2007).
pubmed: 18045790
pmcid: 2238880
doi: 10.1093/nar/gkm1000
Rubino, F. & Creevey, C. MGkit Metagenomic framework for the study of microbial communities. Figshare Poste (2014).
NCBI. NCBI FTP for new_taxdump. https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/new_taxdump/ (Accessed 06 October 2023).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
pubmed: 19451168
pmcid: 2705234
doi: 10.1093/bioinformatics/btp324
Bushnell, B. BBTools Software Package (2014).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
pubmed: 20110278
pmcid: 2832824
doi: 10.1093/bioinformatics/btq033
Wilm, A. et al. LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets. Nucleic Acids Res. 40, 11189–11201 (2012).
pubmed: 23066108
pmcid: 3526318
doi: 10.1093/nar/gks918
Mihara, T. et al. Linking virus genomes with host taxonomy. Viruses 8, 66 (2016).
pubmed: 26938550
pmcid: 4810256
doi: 10.3390/v8030066
Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinform. 11, 1–11 (2010).
doi: 10.1186/1471-2105-11-119
Quantopian. Quantopian GitHub Page. https://github.com/quantopian/qgrid (Accessed 06 October 2023).
Bokeh. Bokeh Homepage. https://bokeh.pydata.org/en/latest/ (Accessed 06 October 2023).
Ondov, B. D., Bergman, N. H. & Phillippy, A. M. Interactive metagenomic visualization in a web browser. BMC Bioinform. 12, 1–10 (2011).
doi: 10.1186/1471-2105-12-385
Robinson, J. T., Thorvaldsdóttir, H., Turner, D. & Mesirov, J. P. igv.js: an embeddable JavaScript implementation of the integrative genomics viewer (IGV). Bioinformatics 39, 830 (2023).
doi: 10.1093/bioinformatics/btac830
Kluyver, T. et al. Positioning and Power in Academic Publishing: Players, Agents and Agendas 87–90 (IOS, 2016).
Ragan-Kelley, B. et al. Proceedings of the 17th Python in Science Conference 113–120 (eds. Akici, F.).
IBM. IBM Spectrum LSF Suites Homepage. https://www.ibm.com/products/hpc-workload-management (Accessed 06 October 2023).
Sched, M. D. Slurm Homepage. https://www.schedmd.com/ (Accessed 06 October 2023).
Schmitz, D. et al. Metagenomic surveillance of viral gastroenteritis in a public health setting. Microbiol. Spectr. 11, e05022 (2023).
pubmed: 37432120
pmcid: 10434279
doi: 10.1128/spectrum.05022-22
Rivest, R. (Editor RFC, 1992).
Fitzpatrick, P. daff GitHub Page. https://github.com/paulfitz/daff (Accessed 06 October 2023).
de Vries, J. J. et al. Recommendations for the introduction of metagenomic next-generation sequencing in clinical virology, part II: bioinformatic analysis and reporting. J. Clin. Virol. 138, 104812 (2021).
pubmed: 33819811
doi: 10.1016/j.jcv.2021.104812
Brinkmann, A. et al. Proficiency testing of virus diagnostics based on bioinformatics analysis of simulated in silico high-throughput sequencing data sets. J. Clin. Microbiol. 57, 419. https://doi.org/10.1128/jcm.00466-00419 (2019).
doi: 10.1128/jcm.00466-00419
de Vries, J. J. et al. Benchmark of thirteen bioinformatic pipelines for metagenomic virus diagnostics using datasets from clinical samples. J. Clin. Virol. 141, 104908 (2021).
pubmed: 34273858
pmcid: 7615111
doi: 10.1016/j.jcv.2021.104908
Nieroda, L. et al. iRODS metadata management for a cancer genome analysis workflow. BMC Bioinform. 20, 1–8 (2019).
doi: 10.1186/s12859-018-2576-5
Salter, S. J. et al. Reagent and laboratory contamination can critically impact sequence-based microbiome analyses. BMC Biol. 12, 1–12 (2014).
doi: 10.1186/s12915-014-0087-z
Kulakov, L. A., McAlister, M. B., Ogden, K. L., Larkin, M. J. & O’Hanlon, J. F. Analysis of bacteria contaminating ultrapure water in industrial systems. Appl. Environ. Microbiol. 68, 1548–1555 (2002).
pubmed: 11916667
pmcid: 123900
doi: 10.1128/AEM.68.4.1548-1555.2002
iRODS. iRODS Homepage. https://irods.org/ (Accessed 06 October 2023).
Bodewes, R. et al. Molecular epidemiology of mumps viruses in the Netherlands, 2017–2019. PLoS ONE 15, e0233143 (2020).
pubmed: 32925979
pmcid: 7489541
doi: 10.1371/journal.pone.0233143
Benschop, K. S. et al. Molecular epidemiology and evolutionary trajectory of emerging Echovirus 30, Europe. Emerg. Infect. Dis. 27, 1616 (2021).
pubmed: 34013874
pmcid: 8153861
doi: 10.3201/eid2706.203096
Amid, C. et al. The COMPARE data hubs. Database 2019, 136 (2019).
RIVM Bioinformatics Team. Juno GitHub Page. https://github.com/RIVM-bioinformatics/juno-assembly (Accessed 06 October 2023).