LD-annot: A Bioinformatics Tool to Automatically Provide Candidate SNPs With Annotations for Genetically Linked Genes.
SNP annotation
SNP chip analyses
bioinformatics tool
candidate SNP
linkage disequilibrium
variant call format (VCF)
Journal
Frontiers in genetics
ISSN: 1664-8021
Titre abrégé: Front Genet
Pays: Switzerland
ID NLM: 101560621
Informations de publication
Date de publication:
2019
2019
Historique:
received:
02
07
2019
accepted:
28
10
2019
entrez:
19
12
2019
pubmed:
19
12
2019
medline:
19
12
2019
Statut:
epublish
Résumé
A multitude of model and non-model species studies have now taken full advantage of powerful high-throughput genotyping advances such as SNP arrays and genotyping-by-sequencing (GBS) technology to investigate the genetic basis of trait variation. However, due to incomplete genome coverage by these technologies, the identified SNPs are likely in linkage disequilibrium (LD) with the causal polymorphisms, rather than be causal themselves. In addition, researchers could benefit from annotations for the identified candidate SNPs and, simultaneously, for all neighboring genes in genetic linkage. In such case, LD extent estimation surrounding the candidate SNPs is required to determine the regions encompassing genes of interest. We describe here an automated pipeline, "LD-annot," designed to delineate specific regions of interest for a given experiment and candidate polymorphisms on the basis of LD extent, and furthermore, provide annotations for all genes within such regions. LD-annot uses standard file formats, bioinformatics tools, and languages to provide identifiers, coordinates, and annotations for genes in genetic linkage with each candidate polymorphism. Although the focus lies upon SNP arrays and GBS data as they are being routinely deployed, this pipeline can be applied to a variety of datasets as long as genotypic data are available for a high number of polymorphisms and formatted into a vcf file. A checkpoint procedure in the pipeline allows to test several threshold values for linkage without having to rerun the entire pipeline, thus saving the user computational time and resources. We applied this new pipeline to four different sample sets: two breeding populations GBS datasets, one within-pedigree SNP set coming from whole genome sequencing (WGS), and a very large multi-varieties SNP dataset obtained from WGS, representing variable sample sizes, and numbers of polymorphisms. LD-annot performed within minutes, even when very high numbers of polymorphisms are investigated and thus will efficiently assist research efforts aimed at identifying biologically meaningful genetic polymorphisms underlying phenotypic variation. LD-annot tool is available under a GPL license from https://github.com/ArnaudDroitLab/LD-annot.
Identifiants
pubmed: 31850063
doi: 10.3389/fgene.2019.01192
pmc: PMC6889475
doi:
Types de publication
Journal Article
Langues
eng
Pagination
1192Informations de copyright
Copyright © 2019 Prunier, Lemaçon, Bastien, Jafarikia, Porth, Robert and Droit.
Références
Sci Rep. 2018 Jan 12;8(1):691
pubmed: 29330432
Biostatistics. 2007 Apr;8(2):485-99
pubmed: 17189563
Plant Biotechnol J. 2015 Feb;13(2):211-21
pubmed: 25213593
Theor Appl Genet. 2018 Mar;131(3):499-511
pubmed: 29352324
PLoS One. 2011 May 04;6(5):e19379
pubmed: 21573248
Mol Ecol. 2013 Jun;22(11):2971-85
pubmed: 23701376
Nat Genet. 2014 Aug;46(8):912-918
pubmed: 25017105
Bioinformatics. 2008 Dec 15;24(24):2938-9
pubmed: 18974171
Curr Opin Immunol. 2012 Oct;24(5):564-70
pubmed: 23031443
Mol Ecol. 2013 Jun;22(11):2848-63
pubmed: 23121191
Nucleic Acids Res. 2010 Sep;38(16):e164
pubmed: 20601685
Am J Hum Genet. 2001 Jul;69(1):1-14
pubmed: 11410837
Nat Genet. 2011 May;43(5):491-8
pubmed: 21478889
Theor Appl Genet. 1968 Jun;38(6):226-31
pubmed: 24442307
Mol Ecol. 2013 Jun;22(11):2841-7
pubmed: 23711105
PLoS One. 2013 May 31;8(5):e65688
pubmed: 23741505
Fly (Austin). 2012 Apr-Jun;6(2):80-92
pubmed: 22728672
PLoS Comput Biol. 2012;8(12):e1002822
pubmed: 23300413
Genome Res. 2005 Nov;15(11):1519-34
pubmed: 16251462
Mol Ecol. 2013 Jun;22(11):2898-916
pubmed: 23205767
Am J Hum Genet. 2006 May;78(5):884-888
pubmed: 16642443
Bioinformatics. 2015 Nov 1;31(21):3555-7
pubmed: 26139635
Theor Popul Biol. 2008 Aug;74(1):130-7
pubmed: 18572214
Nature. 2001 May 10;411(6834):199-204
pubmed: 11346797
Am J Hum Genet. 2011 Jul 15;89(1):28-43
pubmed: 21700266
Am J Hum Genet. 2006 Apr;78(4):588-603
pubmed: 16532390