CONNET: Accurate Genome Consensus in Assembling Nanopore Sequencing Data via Deep Learning.
Bioinformatics
Genomics
Sequence Analysis
Journal
iScience
ISSN: 2589-0042
Titre abrégé: iScience
Pays: United States
ID NLM: 101724038
Informations de publication
Date de publication:
22 May 2020
22 May 2020
Historique:
received:
20
01
2020
revised:
24
04
2020
accepted:
28
04
2020
pubmed:
19
5
2020
medline:
19
5
2020
entrez:
19
5
2020
Statut:
ppublish
Résumé
Single-molecule sequencing technologies produce much longer reads compared with next-generation sequencing, greatly improving the contiguity of de novo assembly of genomes. However, the relatively high error rates in long reads make it challenging to obtain high-quality assemblies. A computationally intensive consensus step is needed to resolve the discrepancies in the reads. Efficient consensus tools have emerged in the recent past, based on partial-order alignment. In this study, we discovered that the spatial relationship of alignment pileup is crucial to high-quality consensus and developed a deep learning-based consensus tool, CONNET, which outperforms the fastest tools in terms of both accuracy and speed. We tested CONNET using a 90× dataset of E. coli and a 37× human dataset. In addition to achieving high-quality consensus results, CONNET is capable of delivering phased diploid genome consensus. Diploid consensus on the above-mentioned human assembly further reduced 12% of the consensus errors made in the haploid results.
Identifiants
pubmed: 32422594
pii: S2589-0042(20)30313-8
doi: 10.1016/j.isci.2020.101128
pmc: PMC7229283
pii:
doi:
Types de publication
Journal Article
Langues
eng
Pagination
101128Informations de copyright
Copyright © 2020 The Author(s). Published by Elsevier Inc. All rights reserved.
Déclaration de conflit d'intérêts
Declaration of Interests The authors declare no competing interests.
Références
Nat Biotechnol. 2014 Mar;32(3):246-51
pubmed: 24531798
Genome Res. 2017 May;27(5):737-746
pubmed: 28100585
Bioinformatics. 2016 Jul 15;32(14):2103-10
pubmed: 27153593
Gigascience. 2017 Apr 1;6(4):1-6
pubmed: 28327957
Genome Res. 2017 May;27(5):722-736
pubmed: 28298431
Bioinformatics. 2002 Mar;18(3):452-64
pubmed: 11934745
Nat Biotechnol. 2018 Apr;36(4):338-345
pubmed: 29431738
Nat Methods. 2015 Aug;12(8):733-5
pubmed: 26076426
Nat Methods. 2020 Feb;17(2):155-158
pubmed: 31819265
Bioinformatics. 2013 Apr 15;29(8):1072-5
pubmed: 23422339