How array design creates SNP ascertainment bias.
Journal
PloS one
ISSN: 1932-6203
Titre abrégé: PLoS One
Pays: United States
ID NLM: 101285081
Informations de publication
Date de publication:
2021
2021
Historique:
received:
09
09
2020
accepted:
22
12
2020
entrez:
30
3
2021
pubmed:
31
3
2021
medline:
1
9
2021
Statut:
epublish
Résumé
Single nucleotide polymorphisms (SNPs), genotyped with arrays, have become a widely used marker type in population genetic analyses over the last 10 years. However, compared to whole genome re-sequencing data, arrays are known to lack a substantial proportion of globally rare variants and tend to be biased towards variants present in populations involved in the development process of the respective array. This affects population genetic estimators and is known as SNP ascertainment bias. We investigated factors contributing to ascertainment bias in array development by redesigning the Axiom™ Genome-Wide Chicken Array in silico and evaluating changes in allele frequency spectra and heterozygosity estimates in a stepwise manner. A sequential reduction of rare alleles during the development process was shown. This was mainly caused by the identification of SNPs in a limited set of populations and a within-population selection of common SNPs when aiming for equidistant spacing. These effects were shown to be less severe with a larger discovery panel. Additionally, a generally massive overestimation of expected heterozygosity for the ascertained SNP sets was shown. This overestimation was 24% higher for populations involved in the discovery process than not involved populations in case of the original array. The same was observed after the SNP discovery step in the redesign. However, an unequal contribution of populations during the SNP selection can mask this effect but also adds uncertainty. Finally, we make suggestions for the design of specialized arrays for large scale projects where whole genome re-sequencing techniques are still too expensive.
Identifiants
pubmed: 33784304
doi: 10.1371/journal.pone.0245178
pii: PONE-D-20-28413
pmc: PMC8009414
doi:
Types de publication
Journal Article
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
e0245178Déclaration de conflit d'intérêts
The authors have declared that no competing interests exist.
Références
PLoS One. 2009 Aug 05;4(8):e6524
pubmed: 19654876
Proc Natl Acad Sci U S A. 2013 Apr 9;110(15):E1398-406
pubmed: 23530234
G3 (Bethesda). 2017 Jan 5;7(1):109-117
pubmed: 27852011
C R Biol. 2011 Mar;334(3):197-204
pubmed: 21377614
Nat Rev Genet. 2016 Feb;17(2):81-92
pubmed: 26729255
Rice (N Y). 2015 Dec;8(1):35
pubmed: 26626493
Anim Genet. 2010 Aug;41(4):377-89
pubmed: 20096028
PLoS One. 2013 Sep 05;8(9):e74612
pubmed: 24040295
Theor Appl Genet. 2017 Nov;130(11):2283-2295
pubmed: 28780586
Sci Rep. 2016 Sep 16;6:33256
pubmed: 27633116
Mol Ecol Resour. 2011 Mar;11 Suppl 1:218-25
pubmed: 21429176
Science. 2009 Apr 24;324(5926):528-32
pubmed: 19390050
Jpn J Genet. 1991 Aug;66(4):367-86
pubmed: 1954033
PLoS Genet. 2019 Apr 29;15(4):e1007989
pubmed: 31034467
Mol Biol Evol. 2010 Nov;27(11):2534-47
pubmed: 20558595
PLoS One. 2012;7(3):e34130
pubmed: 22470530
Genome Res. 2005 Nov;15(11):1496-502
pubmed: 16251459
BMC Genomics. 2013 Jan 28;14:59
pubmed: 23356797
Proc Natl Acad Sci U S A. 2008 Nov 11;105(45):17312-7
pubmed: 18981413
BMC Genomics. 2015 Apr 03;16:266
pubmed: 25887858
Hum Genomics. 2004 Mar;1(3):218-24
pubmed: 15588481
BMC Genomics. 2014 Sep 29;15:823
pubmed: 25266061
Nat Rev Genet. 2014 Nov;15(11):749-63
pubmed: 25246196
Anim Genet. 2017 Jun;48(3):255-271
pubmed: 27910110
Genome Biol. 2016 Jun 06;17(1):122
pubmed: 27268795
Genome Res. 2010 Sep;20(9):1297-303
pubmed: 20644199
PLoS One. 2011 May 04;6(5):e19379
pubmed: 21573248
Genet Epidemiol. 2012 Sep;36(6):549-60
pubmed: 22674656
Genetics. 2010 Sep;186(1):207-18
pubmed: 20457880
Theor Appl Genet. 2013 Jan;126(1):133-41
pubmed: 22945268
BMC Genomics. 2018 Jan 5;19(1):22
pubmed: 29304727
Genetics. 2004 Dec;168(4):2373-82
pubmed: 15371362
Sci Rep. 2015 Jun 26;5:11600
pubmed: 26111882
BMC Genomics. 2019 May 7;20(1):345
pubmed: 31064348
Nat Genet. 2011 May;43(5):491-8
pubmed: 21478889
Nature. 2008 Nov 6;456(7218):98-101
pubmed: 18758442
Genome Res. 2009 Mar;19(3):510-9
pubmed: 19088305
Hum Biol. 2001 Jun;73(3):411-27
pubmed: 11459422
Genetics. 2012 Nov;192(3):1065-93
pubmed: 22960212
PLoS One. 2009;4(4):e5350
pubmed: 19390634
Sci Rep. 2018 Jul 5;8(1):10209
pubmed: 29977040
Bioinformatics. 2009 Feb 15;25(4):552-4
pubmed: 19136550
Heredity (Edinb). 2014 Jan;112(1):39-47
pubmed: 23549338
Theor Popul Biol. 2003 May;63(3):245-55
pubmed: 12689795
Genetics. 2001 Apr;157(4):1819-29
pubmed: 11290733
Nat Rev Genet. 2009 Jun;10(6):381-91
pubmed: 19448663
PLoS One. 2014 Jan 22;9(1):e86227
pubmed: 24465974
Cytogenet Genome Res. 2008;120(1-2):150-6
pubmed: 18467841
PLoS Genet. 2010 Feb 12;6(2):e1000843
pubmed: 20169178
PLoS Genet. 2014 Feb 27;10(2):e1004148
pubmed: 24586189
Ann Eugen. 1951 Mar;15(4):323-54
pubmed: 24540312
PLoS Genet. 2008 Feb 29;4(2):e1000010
pubmed: 18454198
PLoS Genet. 2012;8(5):e1002685
pubmed: 22570636
Curr Protoc Bioinformatics. 2013;43:11.10.1-11.10.33
pubmed: 25431634
PLoS Genet. 2007 Aug;3(8):e144
pubmed: 17722986
Bioinformatics. 2004 Jan 22;20(2):289-90
pubmed: 14734327
Genet Sel Evol. 2015 May 09;47:43
pubmed: 25956961
BMC Biol. 2020 Feb 12;18(1):13
pubmed: 32050971
PLoS One. 2010 Sep 30;5(9):
pubmed: 20927341
Bioessays. 2013 Sep;35(9):780-6
pubmed: 23836388
PLoS One. 2009;4(3):e4668
pubmed: 19270757
J Anim Breed Genet. 2014 Dec;131(6):483-6
pubmed: 24862839
BMC Genomics. 2011 May 31;12(1):274
pubmed: 21627800