Needlestack: an ultra-sensitive variant caller for multi-sample next generation sequencing data.
Journal
NAR genomics and bioinformatics
ISSN: 2631-9268
Titre abrégé: NAR Genom Bioinform
Pays: England
ID NLM: 101756213
Informations de publication
Date de publication:
Jun 2020
Jun 2020
Historique:
received:
06
06
2019
revised:
28
01
2020
accepted:
16
04
2020
entrez:
5
5
2020
pubmed:
5
5
2020
medline:
5
5
2020
Statut:
ppublish
Résumé
The emergence of next-generation sequencing (NGS) has revolutionized the way of reaching a genome sequence, with the promise of potentially providing a comprehensive characterization of DNA variations. Nevertheless, detecting somatic mutations is still a difficult problem, in particular when trying to identify low abundance mutations, such as subclonal mutations, tumour-derived alterations in body fluids or somatic mutations from histological normal tissue. The main challenge is to precisely distinguish between sequencing artefacts and true mutations, particularly when the latter are so rare they reach similar abundance levels as artefacts. Here, we present needlestack, a highly sensitive variant caller, which directly learns from the data the level of systematic sequencing errors to accurately call mutations. Needlestack is based on the idea that the sequencing error rate can be dynamically estimated from analysing multiple samples together. We show that the sequencing error rate varies across alterations, illustrating the need to precisely estimate it. We evaluate the performance of needlestack for various types of variations, and we show that needlestack is robust among positions and outperforms existing state-of-the-art method for low abundance mutations. Needlestack, along with its source code is freely available on the GitHub platform: https://github.com/IARCbioinfo/needlestack.
Identifiants
pubmed: 32363341
doi: 10.1093/nargab/lqaa021
pii: lqaa021
pmc: PMC7182099
doi:
Types de publication
Journal Article
Langues
eng
Pagination
lqaa021Informations de copyright
© World Health Organization and the authors, 2020. All rights reserved. The World Health Organization and the authors have granted the Publisher permission for the reproduction of this article.
Références
Nat Methods. 2011 Nov 20;9(1):72-4
pubmed: 22101854
BMC Bioinformatics. 2013;14 Suppl 5:S1
pubmed: 23735080
Bioinformatics. 2018 Sep 1;34(17):3038-3040
pubmed: 29668842
Sci Rep. 2018 Jul 19;8(1):10950
pubmed: 30026539
EBioMedicine. 2016 Aug;10:117-23
pubmed: 27377626
Comput Struct Biotechnol J. 2018 Feb 06;16:15-24
pubmed: 29552334
Nat Biotechnol. 2019 May;37(5):561-566
pubmed: 30936564
Nat Methods. 2015 Jul;12(7):623-30
pubmed: 25984700
Bioinformatics. 2009 Aug 15;25(16):2078-9
pubmed: 19505943
Bioinformatics. 2014 Oct;30(19):2813-5
pubmed: 24907369
PLoS One. 2016 Nov 28;11(11):e0167047
pubmed: 27893777
Nature. 2012 Jan 18;481(7381):306-13
pubmed: 22258609
Science. 2018 Nov 23;362(6417):911-917
pubmed: 30337457
Brief Bioinform. 2016 Jan;17(1):154-79
pubmed: 26026159
Nat Rev Cancer. 2011 Jun;11(6):426-37
pubmed: 21562580
Cell Rep. 2018 Nov 6;25(6):1446-1457
pubmed: 30404001
Next Gener Seq Appl. 2014;1:
pubmed: 25699289
PLoS Comput Biol. 2013 Apr;9(4):e1003031
pubmed: 23592973
Science. 2015 May 22;348(6237):880-6
pubmed: 25999502
Nat Commun. 2018 Aug 6;9(1):3114
pubmed: 30082701
PLoS One. 2017 May 11;12(5):e0177459
pubmed: 28494014
Nat Commun. 2015 Dec 09;6:10001
pubmed: 26647970
Bioinformatics. 2014 May 1;30(9):1198-204
pubmed: 24443148
Science. 2017 Feb 17;355(6326):752-756
pubmed: 28209900
Am J Hum Genet. 2016 Oct 6;99(4):877-885
pubmed: 27666373
Nat Biotechnol. 2017 Apr 11;35(4):316-319
pubmed: 28398311
Nature. 2015 Aug 6;524(7563):47-53
pubmed: 26168399
Nucleic Acids Res. 2009 Jul;37(13):4181-93
pubmed: 19570852
Biometrics. 2014 Dec;70(4):920-31
pubmed: 25156188
Nat Rev Cancer. 2017 Apr;17(4):223-238
pubmed: 28233803
Nature. 2012 Sep 27;489(7417):519-25
pubmed: 22960745