Quality control and integration of genotypes from two calling pipelines for whole genome sequence data in the Alzheimer's disease sequencing project.
Atlas
Consensus calling
GATK
Mendelian inconsistencies
Quality control
Whole genome sequencing
Journal
Genomics
ISSN: 1089-8646
Titre abrégé: Genomics
Pays: United States
ID NLM: 8800135
Informations de publication
Date de publication:
07 2019
07 2019
Historique:
received:
29
11
2017
revised:
03
04
2018
accepted:
06
05
2018
pubmed:
2
6
2018
medline:
31
12
2019
entrez:
2
6
2018
Statut:
ppublish
Résumé
The Alzheimer's Disease Sequencing Project (ADSP) performed whole genome sequencing (WGS) of 584 subjects from 111 multiplex families at three sequencing centers. Genotype calling of single nucleotide variants (SNVs) and insertion-deletion variants (indels) was performed centrally using GATK-HaplotypeCaller and Atlas V2. The ADSP Quality Control (QC) Working Group applied QC protocols to project-level variant call format files (VCFs) from each pipeline, and developed and implemented a novel protocol, termed "consensus calling," to combine genotype calls from both pipelines into a single high-quality set. QC was applied to autosomal bi-allelic SNVs and indels, and included pipeline-recommended QC filters, variant-level QC, and sample-level QC. Low-quality variants or genotypes were excluded, and sample outliers were noted. Quality was assessed by examining Mendelian inconsistencies (MIs) among 67 parent-offspring pairs, and MIs were used to establish additional genotype-specific filters for GATK calls. After QC, 578 subjects remained. Pipeline-specific QC excluded ~12.0% of GATK and 14.5% of Atlas SNVs. Between pipelines, ~91% of SNV genotypes across all QCed variants were concordant; 4.23% and 4.56% of genotypes were exclusive to Atlas or GATK, respectively; the remaining ~0.01% of discordant genotypes were excluded. For indels, variant-level QC excluded ~36.8% of GATK and 35.3% of Atlas indels. Between pipelines, ~55.6% of indel genotypes were concordant; while 10.3% and 28.3% were exclusive to Atlas or GATK, respectively; and ~0.29% of discordant genotypes were. The final WGS consensus dataset contains 27,896,774 SNVs and 3,133,926 indels and is publicly available.
Identifiants
pubmed: 29857119
pii: S0888-7543(18)30281-7
doi: 10.1016/j.ygeno.2018.05.004
pmc: PMC6397097
mid: NIHMS1010856
pii:
doi:
Types de publication
Journal Article
Research Support, N.I.H., Extramural
Research Support, N.I.H., Intramural
Research Support, Non-U.S. Gov't
Langues
eng
Sous-ensembles de citation
IM
Pagination
808-818Subventions
Organisme : NIA NIH HHS
ID : R01 AG054060
Pays : United States
Organisme : NIA NIH HHS
ID : R01 AG054076
Pays : United States
Organisme : NHGRI NIH HHS
ID : U54 HG003067
Pays : United States
Organisme : NIA NIH HHS
ID : U24 AG021886
Pays : United States
Organisme : NIA NIH HHS
ID : P50 AG008702
Pays : United States
Organisme : NIA NIH HHS
ID : U01 AG016976
Pays : United States
Organisme : NIA NIH HHS
ID : P50 AG005136
Pays : United States
Organisme : NHLBI NIH HHS
ID : R01 HL105756
Pays : United States
Organisme : NIA NIH HHS
ID : U24 AG041689
Pays : United States
Organisme : NIA NIH HHS
ID : R01 AG033193
Pays : United States
Organisme : NHLBI NIH HHS
ID : HHSN268201100009C
Pays : United States
Organisme : NIA NIH HHS
ID : P30 AG010129
Pays : United States
Organisme : NHLBI NIH HHS
ID : HHSN268201100006C
Pays : United States
Organisme : NHLBI NIH HHS
ID : HHSN268201100010C
Pays : United States
Organisme : NIA NIH HHS
ID : U01 AG049505
Pays : United States
Organisme : NHLBI NIH HHS
ID : HHSN268201100008C
Pays : United States
Organisme : NHLBI NIH HHS
ID : RC2 HL102419
Pays : United States
Organisme : NIA NIH HHS
ID : U01 AG058654
Pays : United States
Organisme : NIA NIH HHS
ID : U54 AG052427
Pays : United States
Organisme : NINDS NIH HHS
ID : R01 NS017950
Pays : United States
Organisme : NHLBI NIH HHS
ID : HHSN268201100007C
Pays : United States
Organisme : NIA NIH HHS
ID : U01 AG049507
Pays : United States
Organisme : NIA NIH HHS
ID : U01 AG032984
Pays : United States
Organisme : NHLBI NIH HHS
ID : HHSN268201100011C
Pays : United States
Organisme : NIA NIH HHS
ID : UF1 AG047133
Pays : United States
Organisme : NHGRI NIH HHS
ID : U54 HG003273
Pays : United States
Organisme : NHLBI NIH HHS
ID : HHSN268201100012C
Pays : United States
Organisme : NIA NIH HHS
ID : U01 AG049508
Pays : United States
Organisme : NHLBI NIH HHS
ID : HHSN268201100005C
Pays : United States
Organisme : NIA NIH HHS
ID : U01 AG062602
Pays : United States
Organisme : NHGRI NIH HHS
ID : U54 HG003079
Pays : United States
Organisme : NIA NIH HHS
ID : U01 AG052409
Pays : United States
Informations de copyright
Copyright © 2018. Published by Elsevier Inc.
Références
Biomed Res Int. 2013;2013:865181
pubmed: 24319692
PLoS One. 2012;7(2):e30619
pubmed: 22312429
Brief Bioinform. 2014 Nov;15(6):879-89
pubmed: 24067931
Neurol Genet. 2017 Oct 13;3(5):e194
pubmed: 29184913
Genome Res. 2014 Nov;24(11):1734-9
pubmed: 25304867
PLoS One. 2013;8(4):e60234
pubmed: 23565205
Am J Hum Genet. 1998 Jul;63(1):259-66
pubmed: 9634505
Genomics. 2014 May-Jun;103(5-6):323-8
pubmed: 24703969
Curr Protoc Bioinformatics. 2013;43:11.10.1-11.10.33
pubmed: 25431634
J Appl Genet. 2011 Nov;52(4):413-35
pubmed: 21698376
BMC Bioinformatics. 2014 May 02;15:125
pubmed: 24884706
Nature. 2010 Oct 28;467(7319):1061-73
pubmed: 20981092
Am J Hum Genet. 2007 Jul;81(1):17-31
pubmed: 17564960
Proc Natl Acad Sci U S A. 1987 Apr;84(8):2363-7
pubmed: 3470801
Genome Res. 2010 Sep;20(9):1297-303
pubmed: 20644199
Alzheimers Dement. 2015 Dec;11(12):1397-1406
pubmed: 26433351
Nat Biotechnol. 2014 Mar;32(3):246-51
pubmed: 24531798
Alzheimers Dement. 2016 Jan;12(1):2-10
pubmed: 26365416
BMC Bioinformatics. 2017 Mar 23;18(Suppl 5):119
pubmed: 28361668
Nat Genet. 2011 May;43(5):491-8
pubmed: 21478889
Nat Genet. 2013 Aug;45(8):899-901
pubmed: 23770607
Bioinformatics. 2016 Oct 1;32(19):3047-8
pubmed: 27312411
Bioinformatics. 2015 Jan 15;31(2):187-93
pubmed: 25270638
BMC Bioinformatics. 2014 Apr 12;15:104
pubmed: 24725768
Bioinformatics. 2011 Mar 15;27(6):863-4
pubmed: 21278185
BMC Bioinformatics. 2012 Jan 12;13:8
pubmed: 22239737
Am J Hum Genet. 2013 Apr 4;92(4):504-16
pubmed: 23561844
Front Genet. 2014 Feb 12;5:16
pubmed: 24575121
Bioinformatics. 2015 Dec 1;31(23):3790-8
pubmed: 26231429