MicroPIPE: validating an end-to-end workflow for high-quality complete bacterial genome construction.


Journal

BMC genomics
ISSN: 1471-2164
Titre abrégé: BMC Genomics
Pays: England
ID NLM: 100965258

Informations de publication

Date de publication:
25 Jun 2021
Historique:
received: 09 03 2021
accepted: 03 06 2021
entrez: 26 6 2021
pubmed: 27 6 2021
medline: 30 6 2021
Statut: epublish

Résumé

Oxford Nanopore Technology (ONT) long-read sequencing has become a popular platform for microbial researchers due to the accessibility and affordability of its devices. However, easy and automated construction of high-quality bacterial genomes using nanopore reads remains challenging. Here we aimed to create a reproducible end-to-end bacterial genome assembly pipeline using ONT in combination with Illumina sequencing. We evaluated the performance of several popular tools used during genome reconstruction, including base-calling, filtering, assembly, and polishing. We also assessed overall genome accuracy using ONT both natively and with Illumina. All steps were validated using the high-quality complete reference genome for the Escherichia coli sequence type (ST)131 strain EC958. Software chosen at each stage were incorporated into our final pipeline, MicroPIPE. Further validation of MicroPIPE was carried out using 11 additional ST131 E. coli isolates, which demonstrated that complete circularised chromosomes and plasmids could be achieved without manual intervention. Twelve publicly available Gram-negative and Gram-positive bacterial genomes (with available raw ONT data and matched complete genomes) were also assembled using MicroPIPE. We found that revised basecalling and updated assembly of the majority of these genomes resulted in improved accuracy compared to the current publicly available complete genomes. MicroPIPE is built in modules using Singularity container images and the bioinformatics workflow manager Nextflow, allowing changes and adjustments to be made in response to future tool development. Overall, MicroPIPE provides an easy-access, end-to-end solution for attaining high-quality bacterial genomes. MicroPIPE is available at https://github.com/BeatsonLab-MicrobialGenomics/micropipe .

Sections du résumé

BACKGROUND BACKGROUND
Oxford Nanopore Technology (ONT) long-read sequencing has become a popular platform for microbial researchers due to the accessibility and affordability of its devices. However, easy and automated construction of high-quality bacterial genomes using nanopore reads remains challenging. Here we aimed to create a reproducible end-to-end bacterial genome assembly pipeline using ONT in combination with Illumina sequencing.
RESULTS RESULTS
We evaluated the performance of several popular tools used during genome reconstruction, including base-calling, filtering, assembly, and polishing. We also assessed overall genome accuracy using ONT both natively and with Illumina. All steps were validated using the high-quality complete reference genome for the Escherichia coli sequence type (ST)131 strain EC958. Software chosen at each stage were incorporated into our final pipeline, MicroPIPE. Further validation of MicroPIPE was carried out using 11 additional ST131 E. coli isolates, which demonstrated that complete circularised chromosomes and plasmids could be achieved without manual intervention. Twelve publicly available Gram-negative and Gram-positive bacterial genomes (with available raw ONT data and matched complete genomes) were also assembled using MicroPIPE. We found that revised basecalling and updated assembly of the majority of these genomes resulted in improved accuracy compared to the current publicly available complete genomes.
CONCLUSIONS CONCLUSIONS
MicroPIPE is built in modules using Singularity container images and the bioinformatics workflow manager Nextflow, allowing changes and adjustments to be made in response to future tool development. Overall, MicroPIPE provides an easy-access, end-to-end solution for attaining high-quality bacterial genomes. MicroPIPE is available at https://github.com/BeatsonLab-MicrobialGenomics/micropipe .

Identifiants

pubmed: 34172000
doi: 10.1186/s12864-021-07767-z
pii: 10.1186/s12864-021-07767-z
pmc: PMC8235852
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

474

Références

PLoS One. 2014 Aug 15;9(8):e104400
pubmed: 25126841
PLoS Comput Biol. 2017 Jun 8;13(6):e1005595
pubmed: 28594827
Microbiol Resour Announc. 2020 Oct 8;9(41):
pubmed: 33033131
PLoS Comput Biol. 2018 Nov 20;14(11):e1006583
pubmed: 30458005
Front Microbiol. 2019 Sep 04;10:2068
pubmed: 31551994
Open Forum Infect Dis. 2017 May 02;4(2):ofx089
pubmed: 28638846
BMC Genomics. 2012 Jan 10;13:14
pubmed: 22233127
Genome Biol. 2018 Jul 13;19(1):90
pubmed: 30005597
Microbiol Resour Announc. 2018 Nov 1;7(17):
pubmed: 30533757
Sci Rep. 2019 Nov 8;9(1):16350
pubmed: 31704961
Gigascience. 2019 May 1;8(5):
pubmed: 31089679
Microbiol Resour Announc. 2018 Sep 13;7(10):
pubmed: 30533626
Bioinformatics. 2014 Aug 1;30(15):2114-20
pubmed: 24695404
Microb Genom. 2019 Nov;5(11):
pubmed: 31697231
PLoS Comput Biol. 2020 Mar 5;16(3):e1007134
pubmed: 32134915
Curr Protoc Bioinformatics. 2002 Nov;Chapter 2:Unit 2.4
pubmed: 18792935
Genome Biol. 2020 Feb 7;21(1):30
pubmed: 32033565
F1000Res. 2019 Dec 23;8:2138
pubmed: 31984131
Front Genet. 2020 Aug 12;11:900
pubmed: 32903372
J Clin Microbiol. 2017 Dec;55(12):3530-3543
pubmed: 29021151
Genome Res. 2017 May;27(5):737-746
pubmed: 28100585
PLoS One. 2017 May 11;12(5):e0177459
pubmed: 28494014
Proc Int Conf Intell Syst Mol Biol. 1994;2:28-36
pubmed: 7584402
mBio. 2016 Apr 26;7(2):e00347-16
pubmed: 27118589
Nat Biotechnol. 2019 May;37(5):540-546
pubmed: 30936562
Genome Biol. 2014;15(11):524
pubmed: 25410596
Nucleic Acids Res. 2019 Jul 2;47(W1):W256-W259
pubmed: 30931475
Nat Biotechnol. 2019 Oct;37(10):1155-1162
pubmed: 31406327
Genome Res. 2017 May;27(5):722-736
pubmed: 28298431
Bioinformatics. 2020 Apr 1;36(7):2253-2255
pubmed: 31778144
Nat Biotechnol. 2020 Sep;38(9):1044-1053
pubmed: 32686750
Curr Opin Microbiol. 2015 Feb;23:110-20
pubmed: 25461581
Microbiol Resour Announc. 2020 Mar 26;9(13):
pubmed: 32217674
Bioinformatics. 2018 Aug 1;34(15):2666-2669
pubmed: 29547981
Genome Biol. 2019 Feb 4;20(1):26
pubmed: 30717772
J Comput Biol. 2012 May;19(5):455-77
pubmed: 22506599
mSystems. 2020 Aug 4;5(4):
pubmed: 32753501
Microb Genom. 2017 Sep 14;3(10):e000132
pubmed: 29177090
Nat Biotechnol. 2017 Apr 11;35(4):316-319
pubmed: 28398311
Nat Methods. 2020 Feb;17(2):155-158
pubmed: 31819265
Clin Infect Dis. 2019 Sep 13;69(7):1232-1234
pubmed: 30721938
Bioinformatics. 2013 Apr 15;29(8):1072-5
pubmed: 23422339
J Antimicrob Chemother. 2014 Oct;69(10):2658-68
pubmed: 24920651
Proc Natl Acad Sci U S A. 2014 Apr 15;111(15):5694-9
pubmed: 24706808
PLoS Comput Biol. 2018 Jan 26;14(1):e1005944
pubmed: 29373581
Genetics. 2006 Apr;172(4):2665-81
pubmed: 16489234
F1000Res. 2017 Feb 3;6:100
pubmed: 28868132
Nat Biotechnol. 2018 Apr;36(4):338-345
pubmed: 29431738
Genome Biol. 2019 Jun 24;20(1):129
pubmed: 31234903
PLoS One. 2014 Nov 19;9(11):e112963
pubmed: 25409509
Nat Methods. 2015 Aug;12(8):733-5
pubmed: 26076426
Sci Data. 2019 Nov 26;6(1):285
pubmed: 31772173
Bioinformatics. 2018 Sep 15;34(18):3094-3100
pubmed: 29750242
Genome Res. 2017 May;27(5):787-792
pubmed: 28130360
Bioinformatics. 2010 Mar 15;26(6):841-2
pubmed: 20110278

Auteurs

Valentine Murigneux (V)

QCIF Facility for Advanced Bioinformatics, Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland, Australia.

Leah W Roberts (LW)

University of Queensland Centre for Clinical Research, Brisbane, Queensland, Australia. leah@ebi.ac.uk.
Queensland Children's Hospital, Brisbane, Queensland, Australia. leah@ebi.ac.uk.
European Bioinformatics Institute, European Molecular Biology Laboratory (EMBL), Hinxton, Cambridge, UK. leah@ebi.ac.uk.

Brian M Forde (BM)

University of Queensland Centre for Clinical Research, Brisbane, Queensland, Australia.

Minh-Duy Phan (MD)

School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Queensland, Australia.

Nguyen Thi Khanh Nhu (NTK)

School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Queensland, Australia.

Adam D Irwin (AD)

University of Queensland Centre for Clinical Research, Brisbane, Queensland, Australia.
Queensland Children's Hospital, Brisbane, Queensland, Australia.

Patrick N A Harris (PNA)

University of Queensland Centre for Clinical Research, Brisbane, Queensland, Australia.
Central Microbiology, Pathology Queensland, Royal Brisbane & Women's Hospital, Brisbane, Queensland, Australia.

David L Paterson (DL)

University of Queensland Centre for Clinical Research, Brisbane, Queensland, Australia.

Mark A Schembri (MA)

School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Queensland, Australia.

David M Whiley (DM)

University of Queensland Centre for Clinical Research, Brisbane, Queensland, Australia.
Queensland Children's Hospital, Brisbane, Queensland, Australia.

Scott A Beatson (SA)

School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Queensland, Australia. scott.beatson@uq.edu.au.
Australian Centre for Ecogenomics, The University of Queensland, Brisbane, Queensland, Australia. scott.beatson@uq.edu.au.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing
Coal Metagenome Phylogeny Bacteria Genome, Bacterial
Humans Colorectal Neoplasms Biomarkers, Tumor Prognosis Gene Expression Regulation, Neoplastic

Classifications MeSH