CONNET: Accurate Genome Consensus in Assembling Nanopore Sequencing Data via Deep Learning.

Bioinformatics Genomics Sequence Analysis

Journal

iScience
ISSN: 2589-0042
Titre abrégé: iScience
Pays: United States
ID NLM: 101724038

Informations de publication

Date de publication:
22 May 2020
Historique:
received: 20 01 2020
revised: 24 04 2020
accepted: 28 04 2020
pubmed: 19 5 2020
medline: 19 5 2020
entrez: 19 5 2020
Statut: ppublish

Résumé

Single-molecule sequencing technologies produce much longer reads compared with next-generation sequencing, greatly improving the contiguity of de novo assembly of genomes. However, the relatively high error rates in long reads make it challenging to obtain high-quality assemblies. A computationally intensive consensus step is needed to resolve the discrepancies in the reads. Efficient consensus tools have emerged in the recent past, based on partial-order alignment. In this study, we discovered that the spatial relationship of alignment pileup is crucial to high-quality consensus and developed a deep learning-based consensus tool, CONNET, which outperforms the fastest tools in terms of both accuracy and speed. We tested CONNET using a 90× dataset of E. coli and a 37× human dataset. In addition to achieving high-quality consensus results, CONNET is capable of delivering phased diploid genome consensus. Diploid consensus on the above-mentioned human assembly further reduced 12% of the consensus errors made in the haploid results.

Identifiants

pubmed: 32422594
pii: S2589-0042(20)30313-8
doi: 10.1016/j.isci.2020.101128
pmc: PMC7229283
pii:
doi:

Types de publication

Journal Article

Langues

eng

Pagination

101128

Informations de copyright

Copyright © 2020 The Author(s). Published by Elsevier Inc. All rights reserved.

Déclaration de conflit d'intérêts

Declaration of Interests The authors declare no competing interests.

Références

Nat Biotechnol. 2014 Mar;32(3):246-51
pubmed: 24531798
Genome Res. 2017 May;27(5):737-746
pubmed: 28100585
Bioinformatics. 2016 Jul 15;32(14):2103-10
pubmed: 27153593
Gigascience. 2017 Apr 1;6(4):1-6
pubmed: 28327957
Genome Res. 2017 May;27(5):722-736
pubmed: 28298431
Bioinformatics. 2002 Mar;18(3):452-64
pubmed: 11934745
Nat Biotechnol. 2018 Apr;36(4):338-345
pubmed: 29431738
Nat Methods. 2015 Aug;12(8):733-5
pubmed: 26076426
Nat Methods. 2020 Feb;17(2):155-158
pubmed: 31819265
Bioinformatics. 2013 Apr 15;29(8):1072-5
pubmed: 23422339

Auteurs

Yifan Zhang (Y)

Department of Computer Science, The University of Hong Kong, Hong Kong, China.

Chi-Man Liu (CM)

Department of Computer Science, The University of Hong Kong, Hong Kong, China.

Henry C M Leung (HCM)

Department of Computer Science, The University of Hong Kong, Hong Kong, China.

Ruibang Luo (R)

Department of Computer Science, The University of Hong Kong, Hong Kong, China. Electronic address: rbluo@cs.hku.hk.

Tak-Wah Lam (TW)

Department of Computer Science, The University of Hong Kong, Hong Kong, China. Electronic address: twlam@cs.hku.hk.

Classifications MeSH