Best practices for analyzing imputed genotypes from low-pass sequencing in dogs.
Journal
Mammalian genome : official journal of the International Mammalian Genome Society
ISSN: 1432-1777
Titre abrégé: Mamm Genome
Pays: United States
ID NLM: 9100916
Informations de publication
Date de publication:
03 2022
03 2022
Historique:
received:
07
07
2021
accepted:
01
09
2021
pubmed:
10
9
2021
medline:
26
4
2022
entrez:
9
9
2021
Statut:
ppublish
Résumé
Although DNA array-based approaches for genome-wide association studies (GWAS) permit the collection of thousands of low-cost genotypes, it is often at the expense of resolution and completeness, as SNP chip technologies are ultimately limited by SNPs chosen during array development. An alternative low-cost approach is low-pass whole genome sequencing (WGS) followed by imputation. Rather than relying on high levels of genotype confidence at a set of select loci, low-pass WGS and imputation rely on the combined information from millions of randomly sampled low-confidence genotypes. To investigate low-pass WGS and imputation in the dog, we assessed accuracy and performance by downsampling 97 high-coverage (> 15×) WGS datasets from 51 different breeds to approximately 1× coverage, simulating low-pass WGS. Using a reference panel of 676 dogs from 91 breeds, genotypes were imputed from the downsampled data and compared to a truth set of genotypes generated from high-coverage WGS. Using our truth set, we optimized a variant quality filtering strategy that retained approximately 80% of 14 M imputed sites and lowered the imputation error rate from 3.0% to 1.5%. Seven million sites remained with a MAF > 5% and an average imputation quality score of 0.95. Finally, we simulated the impact of imputation errors on outcomes for case-control GWAS, where small effect sizes were most impacted and medium-to-large effect sizes were minorly impacted. These analyses provide best practice guidelines for study design and data post-processing of low-pass WGS-imputed genotypes in dogs.
Identifiants
pubmed: 34498136
doi: 10.1007/s00335-021-09914-z
pii: 10.1007/s00335-021-09914-z
pmc: PMC8913487
doi:
Types de publication
Journal Article
Research Support, N.I.H., Intramural
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
213-229Informations de copyright
© 2021. This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply.
Références
Genome Med. 2019 Oct 22;11(1):64
pubmed: 31640730
Nat Genet. 2007 Nov;39(11):1321-8
pubmed: 17906626
Hum Genet. 2010 Dec;128(6):597-608
pubmed: 20821337
Nat Genet. 2016 Aug;48(8):965-969
pubmed: 27376236
Eur J Hum Genet. 2012 May;20(5):572-6
pubmed: 22189269
Cell Rep. 2017 Apr 25;19(4):697-708
pubmed: 28445722
Nature. 2005 Dec 8;438(7069):803-19
pubmed: 16341006
BMC Genomics. 2021 Mar 20;22(1):197
pubmed: 33743587
Proc Natl Acad Sci U S A. 2009 Feb 24;106(8):2794-9
pubmed: 19188595
Brief Bioinform. 2019 Nov 06;:
pubmed: 32002535
Nat Commun. 2016 Jan 22;7:10460
pubmed: 26795439
Am J Hum Genet. 2021 Apr 1;108(4):656-668
pubmed: 33770507
PLoS Biol. 2010 Aug 10;8(8):e1000451
pubmed: 20711490
Nat Genet. 2007 Jul;39(7):906-13
pubmed: 17572673
PLoS One. 2010 Mar 15;5(3):e9697
pubmed: 20300623
Nat Rev Genet. 2017 Dec;18(12):705-720
pubmed: 28944780
PLoS Genet. 2011 Oct;7(10):e1002316
pubmed: 22022279
Nat Genet. 2016 Oct;48(10):1279-83
pubmed: 27548312
Genet Epidemiol. 2010 May;34(4):319-26
pubmed: 20088020
Genome Res. 2010 Sep;20(9):1297-303
pubmed: 20644199
PLoS Genet. 2019 Sep 16;15(9):e1008003
pubmed: 31525180
Heredity (Edinb). 2019 Dec;123(6):746-758
pubmed: 31611599
Am J Hum Genet. 2016 Jan 7;98(1):116-26
pubmed: 26748515
Genome Res. 2004 Dec;14(12):2388-96
pubmed: 15545498
BMC Genomics. 2021 Apr 21;22(1):290
pubmed: 33882824
Nat Genet. 2011 May;43(5):491-8
pubmed: 21478889
PLoS Genet. 2020 Sep 10;16(9):e1008956
pubmed: 32911491
Genome Biol. 2013 Dec 12;14(12):R132
pubmed: 24330828
Nat Commun. 2019 Apr 2;10(1):1489
pubmed: 30940804
Genes (Basel). 2020 Nov 05;11(11):
pubmed: 33167493
Bioinformatics. 2011 Nov 1;27(21):2987-93
pubmed: 21903627
Genet Epidemiol. 2017 Dec;41(8):744-755
pubmed: 28861891
Mamm Genome. 2016 Oct;27(9-10):485-94
pubmed: 27129452
Curr Protoc Bioinformatics. 2013;43:11.10.1-11.10.33
pubmed: 25431634
Mol Ecol Resour. 2019 Nov;19(6):1497-1515
pubmed: 31359622
Genetics. 2017 May;206(1):91-104
pubmed: 28348060
Genes (Basel). 2020 May 30;11(6):
pubmed: 32486318
Nat Genet. 2021 Jan;53(1):120-126
pubmed: 33414550
Bioinformatics. 2009 Aug 15;25(16):2078-9
pubmed: 19505943
Natl Sci Rev. 2019 Jul;6(4):810-824
pubmed: 31598383
Proc Natl Acad Sci U S A. 2015 Nov 3;112(44):13639-44
pubmed: 26483491
Eur J Hum Genet. 2017 Jun;25(7):869-876
pubmed: 28401899
Mol Ecol Resour. 2017 Jan;17(1):44-53
pubmed: 27401132