NGSEP3: accurate variant calling across species and sequencing protocols.


Journal

Bioinformatics (Oxford, England)
ISSN: 1367-4811
Titre abrégé: Bioinformatics
Pays: England
ID NLM: 9808944

Informations de publication

Date de publication:
01 11 2019
Historique:
received: 22 12 2018
revised: 16 03 2019
accepted: 17 04 2019
pubmed: 18 5 2019
medline: 2 7 2020
entrez: 18 5 2019
Statut: ppublish

Résumé

Accurate detection, genotyping and downstream analysis of genomic variants from high-throughput sequencing data are fundamental features in modern production pipelines for genetic-based diagnosis in medicine or genomic selection in plant and animal breeding. Our research group maintains the Next-Generation Sequencing Experience Platform (NGSEP) as a precise, efficient and easy-to-use software solution for these features. Understanding that incorrect alignments around short tandem repeats are an important source of genotyping errors, we implemented in NGSEP new algorithms for realignment and haplotype clustering of reads spanning indels and short tandem repeats. We performed extensive benchmark experiments comparing NGSEP to state-of-the-art software using real data from three sequencing protocols and four species with different distributions of repetitive elements. NGSEP consistently shows comparative accuracy and better efficiency compared to the existing solutions. We expect that this work will contribute to the continuous improvement of quality in variant calling needed for modern applications in medicine and agriculture. NGSEP is available as open source software at http://ngsep.sf.net. Supplementary data are available at Bioinformatics online.

Identifiants

pubmed: 31099384
pii: 5480128
doi: 10.1093/bioinformatics/btz275
pmc: PMC6853766
doi:

Types de publication

Journal Article Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

4716-4723

Informations de copyright

© The Author(s) 2019. Published by Oxford University Press.

Références

Sci Rep. 2017 Feb 24;7:43169
pubmed: 28233799
BMC Bioinformatics. 2016 Oct 6;17(Suppl 13):349
pubmed: 27766935
Bioinformatics. 2009 Jul 15;25(14):1754-60
pubmed: 19451168
Theor Appl Genet. 2013 Nov;126(11):2699-716
pubmed: 23918062
Nucleic Acids Res. 2015 Sep 3;43(15):7217-28
pubmed: 26130710
Bioinformatics. 2014 Oct 15;30(20):2843-51
pubmed: 24974202
Cancer Inform. 2014 Sep 21;13(Suppl 2):67-82
pubmed: 25288881
BMC Res Notes. 2014 Dec 01;7:864
pubmed: 25435282
Annu Rev Genet. 2010;44:445-77
pubmed: 20809801
PLoS One. 2017 Aug 9;12(8):e0182272
pubmed: 28792971
Genome Res. 2010 Sep;20(9):1297-303
pubmed: 20644199
Hum Genomics. 2015 Aug 19;9:20
pubmed: 26286629
Nat Biotechnol. 2014 Mar;32(3):246-51
pubmed: 24531798
PLoS One. 2011 May 04;6(5):e19379
pubmed: 21573248
Nat Biotechnol. 2018 Nov;36(10):983-987
pubmed: 30247488
Nat Genet. 2014 Aug;46(8):912-918
pubmed: 25017105
Nucleic Acids Res. 1999 Jan 15;27(2):573-80
pubmed: 9862982
Genome Res. 2017 Jan;27(1):157-164
pubmed: 27903644
Genome Med. 2014 Oct 28;6(10):89
pubmed: 25426171
BMC Bioinformatics. 2016 Oct 3;17(1):403
pubmed: 27716037
BMC Bioinformatics. 2015 Nov 11;16:382
pubmed: 26558718
Nat Rev Genet. 2016 May 17;17(6):333-51
pubmed: 27184599
Nucleic Acids Res. 2014 Apr;42(6):e44
pubmed: 24413664
Bioinformatics. 2011 Nov 1;27(21):2987-93
pubmed: 21903627
Nat Methods. 2012 Mar 04;9(4):357-9
pubmed: 22388286
Comput Struct Biotechnol J. 2018 Feb 06;16:15-24
pubmed: 29552334
Genome Med. 2016 Mar 02;8(1):24
pubmed: 26932475
Bioinformatics. 2017 May 1;33(9):1301-1308
pubmed: 28011786
BMC Genomics. 2015 Mar 16;16:190
pubmed: 25887443
Genome Biol. 2017 Aug 10;18(1):152
pubmed: 28806977
Nat Methods. 2018 Aug;15(8):591-594
pubmed: 30013048
BMC Bioinformatics. 2014 Nov 25;15:356
pubmed: 25420514
BMC Bioinformatics. 2017 Jan 3;18(1):8
pubmed: 28049408
G3 (Bethesda). 2013 Nov 06;3(11):1903-26
pubmed: 24022750
Genome Res. 2012 Mar;22(3):568-76
pubmed: 22300766
Metab Eng. 2013 May;17:68-81
pubmed: 23518242
Genome Med. 2015 Dec 07;7:127
pubmed: 26643039
BMC Genomics. 2016 Aug 31;17 Suppl 5:498
pubmed: 27585926
Nat Methods. 2018 Aug;15(8):595-597
pubmed: 30013044

Auteurs

Daniel Tello (D)

Systems and Computing Engineering Department, Universidad de los Andes, Bogotá 111711, Colombia.

Juanita Gil (J)

Systems and Computing Engineering Department, Universidad de los Andes, Bogotá 111711, Colombia.

Cristian D Loaiza (CD)

Biotechnology lab, Centro de Investigación de la caña de azúcar de Colombia, CENICAÑA, Cali 760046, Colombia.

John J Riascos (JJ)

Biotechnology lab, Centro de Investigación de la caña de azúcar de Colombia, CENICAÑA, Cali 760046, Colombia.

Nicolás Cardozo (N)

Systems and Computing Engineering Department, Universidad de los Andes, Bogotá 111711, Colombia.

Jorge Duitama (J)

Systems and Computing Engineering Department, Universidad de los Andes, Bogotá 111711, Colombia.
Agrobiodiversity Research Area, International Center for Tropical Agriculture, Cali 763537, Colombia.

Articles similaires

Genome, Chloroplast Phylogeny Genetic Markers Base Composition High-Throughput Nucleotide Sequencing

Selecting optimal software code descriptors-The case of Java.

Yegor Bugayenko, Zamira Kholmatova, Artem Kruglov et al.
1.00
Software Algorithms Programming Languages

Exploring blood-brain barrier passage using atomic weighted vector and machine learning.

Yoan Martínez-López, Paulina Phoobane, Yanaima Jauriga et al.
1.00
Blood-Brain Barrier Machine Learning Humans Support Vector Machine Software

Classifications MeSH