ngsJulia: population genetic analysis of next-generation DNA sequencing data with Julia language.
Julia language
aneuploidy
genotype likelihoods
high-throughput sequencing data
polyploidy
pooled sequencing
population genetics
Journal
F1000Research
ISSN: 2046-1402
Titre abrégé: F1000Res
Pays: England
ID NLM: 101594320
Informations de publication
Date de publication:
2022
2022
Historique:
accepted:
29
06
2023
pubmed:
25
9
2023
medline:
26
9
2023
entrez:
25
9
2023
Statut:
epublish
Résumé
A sound analysis of DNA sequencing data is important to extract meaningful information and infer quantities of interest. Sequencing and mapping errors coupled with low and variable coverage hamper the identification of genotypes and variants and the estimation of population genetic parameters. Methods and implementations to estimate population genetic parameters from sequencing data available nowadays either are suitable for the analysis of genomes from model organisms only, require moderate sequencing coverage, or are not easily adaptable to specific applications. To address these issues, we introduce ngsJulia, a collection of templates and functions in Julia language to process short-read sequencing data for population genetic analysis. We further describe two implementations, ngsPool and ngsPloidy, for the analysis of pooled sequencing data and polyploid genomes, respectively. Through simulations, we illustrate the performance of estimating various population genetic parameters using these implementations, using both established and novel statistical methods. These results inform on optimal experimental design and demonstrate the applicability of methods in ngsJulia to estimate parameters of interest even from low coverage sequencing data. ngsJulia provide users with a flexible and efficient framework for ad hoc analysis of sequencing data.ngsJulia is available from: https://github.com/mfumagalli/ngsJulia.
Identifiants
pubmed: 37745626
doi: 10.12688/f1000research.104368.2
pmc: PMC10514575
doi:
Types de publication
Journal Article
Langues
eng
Sous-ensembles de citation
IM
Pagination
126Informations de copyright
Copyright: © 2023 Mas-Sandoval A et al.
Déclaration de conflit d'intérêts
No competing interests were disclosed.
Références
BMC Bioinformatics. 2011 Jun 11;12:231
pubmed: 21663684
Nat Genet. 2018 Aug;50(8):1189-1195
pubmed: 30013179
Nat Rev Genet. 2011 Jun;12(6):443-51
pubmed: 21587300
PLoS One. 2015 Oct 13;10(10):e0140462
pubmed: 26461136
Bioinformatics. 2011 Nov 1;27(21):2987-93
pubmed: 21903627
Annu Rev Genet. 2013;47:97-120
pubmed: 24274750
Annu Rev Genomics Hum Genet. 2016 Aug 31;17:95-115
pubmed: 27362342
Bioinformatics. 2011 Dec 15;27(24):3435-6
pubmed: 22025480
BMC Bioinformatics. 2012 Sep 20;13:239
pubmed: 22992255
Mol Ecol. 2021 Dec;30(23):5966-5993
pubmed: 34250668
Nat Rev Genet. 2020 Oct;21(10):597-614
pubmed: 32504078
Genome Biol. 2019 Feb 11;20(1):31
pubmed: 30744683
Genetics. 2013 Nov;195(3):979-92
pubmed: 23979584
Nat Rev Genet. 2014 Nov;15(11):749-63
pubmed: 25246196
Emerg Microbes Infect. 2018 Mar 29;7(1):43
pubmed: 29593275
Theor Popul Biol. 1972 Mar;3(1):87-112
pubmed: 4667078
Genome Res. 2010 Sep;20(9):1297-303
pubmed: 20644199
BMC Bioinformatics. 2014 Nov 25;15:356
pubmed: 25420514
F1000Res. 2023 Jul 14;11:126
pubmed: 37745626
Curr Protoc Bioinformatics. 2013;43:11.10.1-11.10.33
pubmed: 25431634
PLoS One. 2013 Nov 18;8(11):e79667
pubmed: 24260275
Expert Rev Anti Infect Ther. 2017 Sep;15(9):819-827
pubmed: 28783385
Genome Res. 2012 Mar;22(3):568-76
pubmed: 22300766
Bioinformatics. 2009 Aug 15;25(16):2078-9
pubmed: 19505943
Curr Biol. 2021 Mar 22;31(6):R276-R279
pubmed: 33756135
Bioinformatics. 2014 May 15;30(10):1486-7
pubmed: 24458950
PLoS One. 2012;7(7):e37558
pubmed: 22911679
Front Genet. 2012 Apr 24;3:66
pubmed: 22536207