Evaluation of strategies for the assembly of diverse bacterial genomes using MinION long-read sequencing.


Journal

BMC genomics
ISSN: 1471-2164
Titre abrégé: BMC Genomics
Pays: England
ID NLM: 100965258

Informations de publication

Date de publication:
09 Jan 2019
Historique:
received: 12 07 2018
accepted: 16 12 2018
entrez: 11 1 2019
pubmed: 11 1 2019
medline: 10 4 2019
Statut: epublish

Résumé

Short-read sequencing technologies have made microbial genome sequencing cheap and accessible. However, closing genomes is often costly and assembling short reads from genomes that are repetitive and/or have extreme %GC content remains challenging. Long-read, single-molecule sequencing technologies such as the Oxford Nanopore MinION have the potential to overcome these difficulties, although the best approach for harnessing their potential remains poorly evaluated. We sequenced nine bacterial genomes spanning a wide range of GC contents using Illumina MiSeq and Oxford Nanopore MinION sequencing technologies to determine the advantages of each approach, both individually and combined. Assemblies using only MiSeq reads were highly accurate but lacked contiguity, a deficiency that was partially overcome by adding MinION reads to these assemblies. Even more contiguous genome assemblies were generated by using MinION reads for initial assembly, but these assemblies were more error-prone and required further polishing. This was especially pronounced when Illumina libraries were biased, as was the case for our strains with both high and low GC content. Increased genome contiguity dramatically improved the annotation of insertion sequences and secondary metabolite biosynthetic gene clusters, likely because long-reads can disambiguate these highly repetitive but biologically important genomic regions. Genome assembly using short-reads is challenged by repetitive sequences and extreme GC contents. Our results indicate that these difficulties can be largely overcome by using single-molecule, long-read sequencing technologies such as the Oxford Nanopore MinION. Using MinION reads for assembly followed by polishing with Illumina reads generated the most contiguous genomes with sufficient accuracy to enable the accurate annotation of important but difficult to sequence genomic features such as insertion sequences and secondary metabolite biosynthetic gene clusters. The combination of Oxford Nanopore and Illumina sequencing can therefore cost-effectively advance studies of microbial evolution and genome-driven drug discovery.

Sections du résumé

BACKGROUND BACKGROUND
Short-read sequencing technologies have made microbial genome sequencing cheap and accessible. However, closing genomes is often costly and assembling short reads from genomes that are repetitive and/or have extreme %GC content remains challenging. Long-read, single-molecule sequencing technologies such as the Oxford Nanopore MinION have the potential to overcome these difficulties, although the best approach for harnessing their potential remains poorly evaluated.
RESULTS RESULTS
We sequenced nine bacterial genomes spanning a wide range of GC contents using Illumina MiSeq and Oxford Nanopore MinION sequencing technologies to determine the advantages of each approach, both individually and combined. Assemblies using only MiSeq reads were highly accurate but lacked contiguity, a deficiency that was partially overcome by adding MinION reads to these assemblies. Even more contiguous genome assemblies were generated by using MinION reads for initial assembly, but these assemblies were more error-prone and required further polishing. This was especially pronounced when Illumina libraries were biased, as was the case for our strains with both high and low GC content. Increased genome contiguity dramatically improved the annotation of insertion sequences and secondary metabolite biosynthetic gene clusters, likely because long-reads can disambiguate these highly repetitive but biologically important genomic regions.
CONCLUSIONS CONCLUSIONS
Genome assembly using short-reads is challenged by repetitive sequences and extreme GC contents. Our results indicate that these difficulties can be largely overcome by using single-molecule, long-read sequencing technologies such as the Oxford Nanopore MinION. Using MinION reads for assembly followed by polishing with Illumina reads generated the most contiguous genomes with sufficient accuracy to enable the accurate annotation of important but difficult to sequence genomic features such as insertion sequences and secondary metabolite biosynthetic gene clusters. The combination of Oxford Nanopore and Illumina sequencing can therefore cost-effectively advance studies of microbial evolution and genome-driven drug discovery.

Identifiants

pubmed: 30626323
doi: 10.1186/s12864-018-5381-7
pii: 10.1186/s12864-018-5381-7
pmc: PMC6325685
doi:

Substances chimiques

DNA Transposable Elements 0

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

23

Subventions

Organisme : Directorate for Biological Sciences
ID : IOS-1656475
Organisme : Agricultural Research Service
ID : 58-1930-4-002

Références

J Exp Bot. 2017 Nov 28;68(20):5419-5429
pubmed: 28992056
F1000Res. 2017 Jul 7;6:1083
pubmed: 29375809
Genomics Proteomics Bioinformatics. 2016 Oct;14(5):265-279
pubmed: 27646134
BMC Genomics. 2011 Aug 08;12:402
pubmed: 21824423
Nat Chem Biol. 2009 Jun;5(6):391-3
pubmed: 19330011
Nucleic Acids Res. 2016 Jan 4;44(D1):D279-85
pubmed: 26673716
Nucleic Acids Res. 2011 Aug;39(15):e103
pubmed: 21646344
Nat Biotechnol. 2015 Mar;33(3):296-300
pubmed: 25485618
Hum Immunol. 2015 Mar;76(2-3):166-75
pubmed: 25543015
BMC Bioinformatics. 2006 Dec 22;7:541
pubmed: 17187668
Genome Biol. 2011;12(3):R30
pubmed: 21443786
Microbiology. 2007 Jun;153(Pt 6):1897-1906
pubmed: 17526846
Bioinformatics. 2009 Jul 15;25(14):1754-60
pubmed: 19451168
Microb Genom. 2017 Jun 9;3(8):e000118
pubmed: 29026658
Gigascience. 2015 Dec 04;4:60
pubmed: 26640692
Bioinformatics. 2016 Apr 1;32(7):1009-15
pubmed: 26589280
Appl Environ Microbiol. 2011 Nov;77(22):8071-9
pubmed: 21948828
PeerJ. 2016 Mar 29;4:e1839
pubmed: 27069789
Brief Bioinform. 2019 Jul 19;20(4):1542-1559
pubmed: 29617724
BMC Genomics. 2012 Jan 10;13:14
pubmed: 22233127
BMC Bioinformatics. 2010 Jan 12;11:21
pubmed: 20064276
Curr Protoc Microbiol. 2007 Aug;Chapter 13:Unit 13B.1
pubmed: 18770610
mSphere. 2018 Jul 5;3(4):
pubmed: 29976645
Genome Biol. 2013;14(9):R101
pubmed: 24034426
Genome Biol. 2004;5(2):R12
pubmed: 14759262
Nucleic Acids Res. 2012 May;40(10):e72
pubmed: 22323520
Bioinformatics. 2014 Aug 1;30(15):2114-20
pubmed: 24695404
Bioinformatics. 2016 Sep 1;32(17):2582-9
pubmed: 27162186
Nucleic Acids Res. 2005 Nov 07;33(19):e171
pubmed: 16275781
Biomol Detect Quantif. 2015 Mar;3:1-8
pubmed: 26753127
J Bacteriol. 2002 Dec;184(23):6403-5; discusion 6405
pubmed: 12426324
mBio. 2018 Jul 24;9(4):
pubmed: 30042201
F1000Res. 2017 May 31;6:760
pubmed: 28794860
Microb Genom. 2016 Sep 8;2(9):e000085
pubmed: 28348876
Proc Natl Acad Sci U S A. 1987 Jan;84(1):166-9
pubmed: 3467347
PLoS One. 2013 Apr 29;8(4):e62856
pubmed: 23638157
PLoS Comput Biol. 2017 Jun 8;13(6):e1005595
pubmed: 28594827
Nature. 2017 Oct 19;550(7676):345-353
pubmed: 29019985
Appl Environ Microbiol. 2000 Nov;66(11):4735-41
pubmed: 11055917
Sci Rep. 2017 Jun 21;7(1):3935
pubmed: 28638050
Methods Enzymol. 1994;235:174-83
pubmed: 8057894
Gigascience. 2017 Mar 1;6(3):1-6
pubmed: 28327913
Nat Rev Genet. 2016 May 17;17(6):333-51
pubmed: 27184599
Genome Res. 2003 Nov;13(11):2498-504
pubmed: 14597658
Genome Res. 2017 May;27(5):722-736
pubmed: 28298431
PLoS One. 2010 Jul 12;5(7):e11518
pubmed: 20634954
mBio. 2014 Nov 18;5(6):e02136
pubmed: 25406383
Brief Bioinform. 2018 Nov 27;19(6):1256-1272
pubmed: 28637243
BMC Genomics. 2018 Jan 16;19(1):54
pubmed: 29338683
Microb Genom. 2017 Sep 14;3(10):e000132
pubmed: 29177090
Bioinformatics. 2014 Dec 1;30(23):3399-401
pubmed: 25143291
Bioinformatics. 2013 Apr 15;29(8):1072-5
pubmed: 23422339
Nature. 2016 Feb 11;530(7589):228-232
pubmed: 26840485
Int Microbiol. 2013 Mar;16(1):17-25
pubmed: 24151778
Genome Biol. 2011;12(2):R18
pubmed: 21338519
F1000Res. 2015 Oct 15;4:1075
pubmed: 26834992
PLoS One. 2014 Nov 19;9(11):e112963
pubmed: 25409509
Nat Methods. 2015 Aug;12(8):733-5
pubmed: 26076426
Bioinformatics. 2009 Aug 15;25(16):2078-9
pubmed: 19505943
Nucleic Acids Res. 2017 Jul 3;45(W1):W36-W41
pubmed: 28460038
Bioinformatics. 2019 Jul 1;35(13):2193-2198
pubmed: 30462145
Genome Biol. 2016 Jun 20;17(1):132
pubmed: 27323842
PeerJ. 2015 Oct 08;3:e1319
pubmed: 26500826
Microb Genom. 2018 Jul;4(7):
pubmed: 29906261
Genome Res. 2002 May;12(5):669-71
pubmed: 11997333
Bioinformatics. 2014 Jul 15;30(14):2068-9
pubmed: 24642063
Sci Rep. 2015 Jul 07;5:11996
pubmed: 26149338
BMC Bioinformatics. 2009 Dec 15;10:421
pubmed: 20003500
Front Microbiol. 2016 Oct 13;7:1569
pubmed: 27790190

Auteurs

Sarah Goldstein (S)

Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA.

Lidia Beka (L)

Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA.

Joerg Graf (J)

Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA. joerg.graf@uconn.edu.

Jonathan L Klassen (JL)

Department of Molecular and Cell Biology, University of Connecticut, Storrs, CT, USA. jonathan.klassen@uconn.edu.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing
Coal Metagenome Phylogeny Bacteria Genome, Bacterial
Genome, Bacterial Virulence Phylogeny Genomics Plant Diseases

Classifications MeSH