Causalcall: Nanopore Basecalling Using a Temporal Convolutional Network.
assembly
basecalling
deep neural network
nanopore sequencing
performance comparison
temporal convolution
Journal
Frontiers in genetics
ISSN: 1664-8021
Titre abrégé: Front Genet
Pays: Switzerland
ID NLM: 101560621
Informations de publication
Date de publication:
2019
2019
Historique:
received:
29
08
2019
accepted:
05
12
2019
entrez:
11
2
2020
pubmed:
11
2
2020
medline:
11
2
2020
Statut:
epublish
Résumé
Nanopore sequencing is promising because of its long read length and high speed. During sequencing, a strand of DNA/RNA passes through a biological nanopore, which causes the current in the pore to fluctuate. During basecalling, context-dependent current measurements are translated into the base sequence of the DNA/RNA strand. Accurate and fast basecalling is vital for downstream analyses such as genome assembly and detecting single-nucleotide polymorphisms and genomic structural variants. However, owing to the various changes in DNA/RNA molecules, noise during sequencing, and limitations of basecalling methods, accurate basecalling remains a challenge. In this paper, we propose Causalcall, which uses an end-to-end temporal convolution-based deep learning model for accurate and fast nanopore basecalling. Developed on a temporal convolutional network (TCN) and a connectionist temporal classification decoder, Causalcall directly identifies base sequences of varying lengths from current measurements in long time series. In contrast to the basecalling models using recurrent neural networks (RNNs), the convolution-based model of Causalcall can speed up basecalling by matrix computation. Experiments on multiple species have demonstrated the great potential of the TCN-based model to improve basecalling accuracy and speed when compared to an RNN-based model. Besides, experiments on genome assembly indicate the utility of Causalcall in reference-based genome assembly.
Identifiants
pubmed: 32038706
doi: 10.3389/fgene.2019.01332
pmc: PMC6984161
doi:
Types de publication
Journal Article
Langues
eng
Pagination
1332Informations de copyright
Copyright © 2020 Zeng, Cai, Peng, Wang, Zhang and Akutsu.
Références
J Exp Bot. 2017 Nov 28;68(20):5419-5429
pubmed: 28992056
Bioinformatics. 2017 Jan 1;33(1):49-55
pubmed: 27614348
Nat Commun. 2019 Jun 4;10(1):2449
pubmed: 31164644
Genome Res. 2017 May;27(5):737-746
pubmed: 28100585
Nat Commun. 2018 Feb 7;9(1):541
pubmed: 29416032
Nat Methods. 2017 Nov;14(11):1072-1074
pubmed: 28945707
Nat Methods. 2015 Aug;12(8):733-5
pubmed: 26076426
Bioinformatics. 2018 Sep 15;34(18):3094-3100
pubmed: 29750242
Gigascience. 2018 May 1;7(5):
pubmed: 29648610
BMC Bioinformatics. 2016 Sep 17;17:384
pubmed: 27639558
Genome Biol. 2004;5(2):R12
pubmed: 14759262
Mol Cell. 2018 Jul 19;71(2):306-318.e7
pubmed: 30017583
PLoS One. 2017 Jun 5;12(6):e0178751
pubmed: 28582401
Nature. 2016 Feb 11;530(7589):228-232
pubmed: 26840485
Nat Biotechnol. 2018 Apr;36(4):338-345
pubmed: 29431738
Int J Biochem Cell Biol. 2017 Nov;92:218-226
pubmed: 28951246
Genome Biol. 2019 Jun 24;20(1):129
pubmed: 31234903