Impact of pre- and post-variant filtration strategies on imputation.


Journal

Scientific reports
ISSN: 2045-2322
Titre abrégé: Sci Rep
Pays: England
ID NLM: 101563288

Informations de publication

Date de publication:
18 03 2021
Historique:
received: 14 12 2020
accepted: 22 02 2021
entrez: 19 3 2021
pubmed: 20 3 2021
medline: 20 3 2021
Statut: epublish

Résumé

Quality control (QC) methods for genome-wide association studies and fine mapping are commonly used for imputation, however they result in loss of many single nucleotide polymorphisms (SNPs). To investigate the consequences of filtration on imputation, we studied the direct effects on the number of markers, their allele frequencies, imputation quality scores and post-filtration events. We pre-phrased 1031 genotyped individuals from diverse ethnicities and compared the imputed variants to 1089 NCBI recorded individuals for additional validation. Without QC-based variant pre-filtration, we observed no impairment in the imputation of SNPs that failed QC whereas with pre-filtration there was an overall loss of information. Significant differences between frequencies with and without pre-filtration were found only in the range of very rare (5E-04-1E-03) and rare variants (1E-03-5E-03) (p < 1E-04). Increasing the post-filtration imputation quality score from 0.3 to 0.8 reduced the number of single nucleotide variants (SNVs) < 0.001 2.5 fold with or without QC pre-filtration and halved the number of very rare variants (5E-04). Thus, to maintain confidence and enough SNVs, we propose here a two-step filtering procedure which allows less stringent filtering prior to imputation and post-imputation in order to increase the number of very rare and rare variants compared to conservative filtration methods.

Identifiants

pubmed: 33737531
doi: 10.1038/s41598-021-85333-z
pii: 10.1038/s41598-021-85333-z
pmc: PMC7973508
doi:

Types de publication

Journal Article

Langues

eng

Sous-ensembles de citation

IM

Pagination

6214

Commentaires et corrections

Type : ErratumIn

Références

Sci Rep. 2016 Sep 07;6:32512
pubmed: 27600471
Hum Genet. 2008 Dec;124(5):439-50
pubmed: 18850115
BMC Genomics. 2015 Mar 11;16:168
pubmed: 25886991
Nature. 2009 Oct 8;461(7265):747-53
pubmed: 19812666
Methods Mol Biol. 2010;628:119-35
pubmed: 20238079
Bioinformatics. 2016 Jul 1;32(13):1974-80
pubmed: 27153703
G3 (Bethesda). 2011 Nov;1(6):457-70
pubmed: 22384356
Hum Mol Genet. 2013 Oct 15;22(R1):R16-21
pubmed: 23922232
Nat Rev Genet. 2011 Sep 16;12(10):703-14
pubmed: 21921926
Nature. 2012 Nov 1;491(7422):56-65
pubmed: 23128226
Nat Genet. 2016 Oct;48(10):1279-83
pubmed: 27548312
BMC Proc. 2014 Jun 17;8(Suppl 1 Genetic Analysis Workshop 18Vanessa Olmo):S64
pubmed: 25519397
Nat Rev Genet. 2013 Jun;14(6):379-89
pubmed: 23657481
Am J Hum Genet. 2007 Nov;81(5):1084-97
pubmed: 17924348
PLoS One. 2011;6(9):e24945
pubmed: 21949800
Nature. 2015 Oct 1;526(7571):75-81
pubmed: 26432246
Genet Epidemiol. 2012 May;36(4):400-8
pubmed: 22508365
Nat Genet. 2016 Aug;48(8):965-969
pubmed: 27376236
Eur J Hum Genet. 2012 May;20(5):572-6
pubmed: 22189269
PLoS Genet. 2014 Aug 07;10(8):e1004528
pubmed: 25101869
Nature. 2012 Aug 2;488(7409):96-9
pubmed: 22801501
Nature. 2016 Aug 4;536(7614):41-47
pubmed: 27398621
Nat Genet. 2012 Jul 22;44(8):955-9
pubmed: 22820512
Nat Genet. 2016 Oct;48(10):1284-1287
pubmed: 27571263
Eur J Hum Genet. 2011 May;19(5):610-4
pubmed: 21267008
Brief Funct Genomics. 2017 May 1;16(3):163-170
pubmed: 27436001
Front Genet. 2019 Feb 05;10:34
pubmed: 30804980
Nat Genet. 2016 Jul;48(7):817-20
pubmed: 27270105
PLoS Genet. 2009 Jun;5(6):e1000529
pubmed: 19543373
Genet Epidemiol. 2010 Dec;34(8):816-34
pubmed: 21058334
Nat Protoc. 2010 Sep;5(9):1564-73
pubmed: 21085122
Eur J Hum Genet. 2015 Jul;23(7):975-83
pubmed: 25293720
Am J Hum Genet. 2016 Jan 7;98(1):116-26
pubmed: 26748515
Nat Genet. 2007 Jul;39(7):815-6
pubmed: 17597769
Annu Rev Genomics Hum Genet. 2009;10:387-406
pubmed: 19715440
Proc Natl Acad Sci U S A. 2011 Jul 19;108(29):11983-8
pubmed: 21730125
Brief Funct Genomics. 2016 Jul;15(4):298-304
pubmed: 26443613
Genetics. 2003 Dec;165(4):2213-33
pubmed: 14704198
Genet Epidemiol. 2011 Feb;35(2):102-10
pubmed: 21254217
PLoS Genet. 2008 Dec;4(12):e1000279
pubmed: 19057666
Genetics. 2018 Sep;210(1):71-82
pubmed: 30045858
Am J Hum Genet. 2009 Feb;84(2):210-23
pubmed: 19200528
N Engl J Med. 2013 Jan 10;368(2):107-16
pubmed: 23150908
Nat Commun. 2014 Jun 13;5:3934
pubmed: 25653097
Nat Genet. 2016 Nov;48(11):1303-1312
pubmed: 27668658
Bioinformatics. 2009 Jun 1;25(11):1449-50
pubmed: 19346322
Ann Hum Genet. 2010 May;74(3):189-94
pubmed: 20529013
Hum Genet. 2018 Apr;137(4):281-292
pubmed: 29637265
Nat Rev Genet. 2010 Jul;11(7):499-511
pubmed: 20517342
Nucleic Acids Res. 2018 Jan 4;46(D1):D754-D761
pubmed: 29155950
Nature. 2007 Jun 7;447(7145):661-78
pubmed: 17554300
Nature. 2007 Oct 18;449(7164):851-61
pubmed: 17943122
BMC Genomics. 2014 Jul 19;15:610
pubmed: 25038819
Nature. 2015 Oct 1;526(7571):68-74
pubmed: 26432245
Eur J Hum Genet. 2012 Jul;20(7):801-5
pubmed: 22293688
Trends Genet. 2015 Oct;31(10):556-563
pubmed: 26450338
Eur J Hum Genet. 2015 Mar;23(3):395-400
pubmed: 24939589
Genet Epidemiol. 2020 Sep;44(6):537-549
pubmed: 32519380
Nat Commun. 2016 Oct 06;7:12989
pubmed: 27708267
PLoS One. 2016 Jul 27;11(7):e0158801
pubmed: 27463617
Nat Protoc. 2015 Sep;10(9):1285-96
pubmed: 26226460
Bioinformatics. 2018 Apr 1;34(7):1086-1091
pubmed: 29126132
Front Genet. 2014 Dec 11;5:370
pubmed: 25566314
Nat Methods. 2011 Dec 04;9(2):179-81
pubmed: 22138821
BMC Genet. 2014 Aug 12;15:88
pubmed: 25112433
Genomics. 1992 Feb;12(2):183-9
pubmed: 1740328
Am J Hum Genet. 2009 Feb;84(2):235-50
pubmed: 19215730
Hum Hered. 2012;73(2):84-94
pubmed: 22441326
Database (Oxford). 2011 Jul 23;2011:bar030
pubmed: 21785142
Eur J Hum Genet. 2014 Nov;22(11):1321-6
pubmed: 24896149

Auteurs

Céline Charon (C)

CEA Paris-Saclay, Institut François Jacob, Centre National de Recherche en Génomique Humaine, 2 rue Gaston Crémieux, Evry, 91057, France. celine.charon@cng.fr.

Rodrigue Allodji (R)

Radiation Epidemiology Group CESP, Inserm Unit 1018, Gustave Roussy Université Paris Saclay, 114 rue Edouard Vaillant, Villejuif, 94805, France.

Vincent Meyer (V)

CEA Paris-Saclay, Institut François Jacob, Centre National de Recherche en Génomique Humaine, 2 rue Gaston Crémieux, Evry, 91057, France.

Jean-François Deleuze (JF)

CEA Paris-Saclay, Institut François Jacob, Centre National de Recherche en Génomique Humaine, 2 rue Gaston Crémieux, Evry, 91057, France.

Classifications MeSH