Quality control and integration of genotypes from two calling pipelines for whole genome sequence data in the Alzheimer's disease sequencing project.


Journal

Genomics
ISSN: 1089-8646
Titre abrégé: Genomics
Pays: United States
ID NLM: 8800135

Informations de publication

Date de publication:
07 2019
Historique:
received: 29 11 2017
revised: 03 04 2018
accepted: 06 05 2018
pubmed: 2 6 2018
medline: 31 12 2019
entrez: 2 6 2018
Statut: ppublish

Résumé

The Alzheimer's Disease Sequencing Project (ADSP) performed whole genome sequencing (WGS) of 584 subjects from 111 multiplex families at three sequencing centers. Genotype calling of single nucleotide variants (SNVs) and insertion-deletion variants (indels) was performed centrally using GATK-HaplotypeCaller and Atlas V2. The ADSP Quality Control (QC) Working Group applied QC protocols to project-level variant call format files (VCFs) from each pipeline, and developed and implemented a novel protocol, termed "consensus calling," to combine genotype calls from both pipelines into a single high-quality set. QC was applied to autosomal bi-allelic SNVs and indels, and included pipeline-recommended QC filters, variant-level QC, and sample-level QC. Low-quality variants or genotypes were excluded, and sample outliers were noted. Quality was assessed by examining Mendelian inconsistencies (MIs) among 67 parent-offspring pairs, and MIs were used to establish additional genotype-specific filters for GATK calls. After QC, 578 subjects remained. Pipeline-specific QC excluded ~12.0% of GATK and 14.5% of Atlas SNVs. Between pipelines, ~91% of SNV genotypes across all QCed variants were concordant; 4.23% and 4.56% of genotypes were exclusive to Atlas or GATK, respectively; the remaining ~0.01% of discordant genotypes were excluded. For indels, variant-level QC excluded ~36.8% of GATK and 35.3% of Atlas indels. Between pipelines, ~55.6% of indel genotypes were concordant; while 10.3% and 28.3% were exclusive to Atlas or GATK, respectively; and ~0.29% of discordant genotypes were. The final WGS consensus dataset contains 27,896,774 SNVs and 3,133,926 indels and is publicly available.

Identifiants

pubmed: 29857119
pii: S0888-7543(18)30281-7
doi: 10.1016/j.ygeno.2018.05.004
pmc: PMC6397097
mid: NIHMS1010856
pii:
doi:

Types de publication

Journal Article Research Support, N.I.H., Extramural Research Support, N.I.H., Intramural Research Support, Non-U.S. Gov't

Langues

eng

Sous-ensembles de citation

IM

Pagination

808-818

Subventions

Organisme : NIA NIH HHS
ID : R01 AG054060
Pays : United States
Organisme : NIA NIH HHS
ID : R01 AG054076
Pays : United States
Organisme : NHGRI NIH HHS
ID : U54 HG003067
Pays : United States
Organisme : NIA NIH HHS
ID : U24 AG021886
Pays : United States
Organisme : NIA NIH HHS
ID : P50 AG008702
Pays : United States
Organisme : NIA NIH HHS
ID : U01 AG016976
Pays : United States
Organisme : NIA NIH HHS
ID : P50 AG005136
Pays : United States
Organisme : NHLBI NIH HHS
ID : R01 HL105756
Pays : United States
Organisme : NIA NIH HHS
ID : U24 AG041689
Pays : United States
Organisme : NIA NIH HHS
ID : R01 AG033193
Pays : United States
Organisme : NHLBI NIH HHS
ID : HHSN268201100009C
Pays : United States
Organisme : NIA NIH HHS
ID : P30 AG010129
Pays : United States
Organisme : NHLBI NIH HHS
ID : HHSN268201100006C
Pays : United States
Organisme : NHLBI NIH HHS
ID : HHSN268201100010C
Pays : United States
Organisme : NIA NIH HHS
ID : U01 AG049505
Pays : United States
Organisme : NHLBI NIH HHS
ID : HHSN268201100008C
Pays : United States
Organisme : NHLBI NIH HHS
ID : RC2 HL102419
Pays : United States
Organisme : NIA NIH HHS
ID : U01 AG058654
Pays : United States
Organisme : NIA NIH HHS
ID : U54 AG052427
Pays : United States
Organisme : NINDS NIH HHS
ID : R01 NS017950
Pays : United States
Organisme : NHLBI NIH HHS
ID : HHSN268201100007C
Pays : United States
Organisme : NIA NIH HHS
ID : U01 AG049507
Pays : United States
Organisme : NIA NIH HHS
ID : U01 AG032984
Pays : United States
Organisme : NHLBI NIH HHS
ID : HHSN268201100011C
Pays : United States
Organisme : NIA NIH HHS
ID : UF1 AG047133
Pays : United States
Organisme : NHGRI NIH HHS
ID : U54 HG003273
Pays : United States
Organisme : NHLBI NIH HHS
ID : HHSN268201100012C
Pays : United States
Organisme : NIA NIH HHS
ID : U01 AG049508
Pays : United States
Organisme : NHLBI NIH HHS
ID : HHSN268201100005C
Pays : United States
Organisme : NIA NIH HHS
ID : U01 AG062602
Pays : United States
Organisme : NHGRI NIH HHS
ID : U54 HG003079
Pays : United States
Organisme : NIA NIH HHS
ID : U01 AG052409
Pays : United States

Informations de copyright

Copyright © 2018. Published by Elsevier Inc.

Références

Biomed Res Int. 2013;2013:865181
pubmed: 24319692
PLoS One. 2012;7(2):e30619
pubmed: 22312429
Brief Bioinform. 2014 Nov;15(6):879-89
pubmed: 24067931
Neurol Genet. 2017 Oct 13;3(5):e194
pubmed: 29184913
Genome Res. 2014 Nov;24(11):1734-9
pubmed: 25304867
PLoS One. 2013;8(4):e60234
pubmed: 23565205
Am J Hum Genet. 1998 Jul;63(1):259-66
pubmed: 9634505
Genomics. 2014 May-Jun;103(5-6):323-8
pubmed: 24703969
Curr Protoc Bioinformatics. 2013;43:11.10.1-11.10.33
pubmed: 25431634
J Appl Genet. 2011 Nov;52(4):413-35
pubmed: 21698376
BMC Bioinformatics. 2014 May 02;15:125
pubmed: 24884706
Nature. 2010 Oct 28;467(7319):1061-73
pubmed: 20981092
Am J Hum Genet. 2007 Jul;81(1):17-31
pubmed: 17564960
Proc Natl Acad Sci U S A. 1987 Apr;84(8):2363-7
pubmed: 3470801
Genome Res. 2010 Sep;20(9):1297-303
pubmed: 20644199
Alzheimers Dement. 2015 Dec;11(12):1397-1406
pubmed: 26433351
Nat Biotechnol. 2014 Mar;32(3):246-51
pubmed: 24531798
Alzheimers Dement. 2016 Jan;12(1):2-10
pubmed: 26365416
BMC Bioinformatics. 2017 Mar 23;18(Suppl 5):119
pubmed: 28361668
Nat Genet. 2011 May;43(5):491-8
pubmed: 21478889
Nat Genet. 2013 Aug;45(8):899-901
pubmed: 23770607
Bioinformatics. 2016 Oct 1;32(19):3047-8
pubmed: 27312411
Bioinformatics. 2015 Jan 15;31(2):187-93
pubmed: 25270638
BMC Bioinformatics. 2014 Apr 12;15:104
pubmed: 24725768
Bioinformatics. 2011 Mar 15;27(6):863-4
pubmed: 21278185
BMC Bioinformatics. 2012 Jan 12;13:8
pubmed: 22239737
Am J Hum Genet. 2013 Apr 4;92(4):504-16
pubmed: 23561844
Front Genet. 2014 Feb 12;5:16
pubmed: 24575121
Bioinformatics. 2015 Dec 1;31(23):3790-8
pubmed: 26231429

Auteurs

Adam C Naj (AC)

Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA. Electronic address: adamnaj@pennmedicine.upenn.edu.

Honghuang Lin (H)

Department of Medicine, Boston University School of Medicine, Boston, MA, USA.

Badri N Vardarajan (BN)

Department of Neurology, Columbia University Medical Center, New York, NY, USA.

Simon White (S)

Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.

Daniel Lancour (D)

Department of Biomedical Genetics, Boston University School of Medicine, Boston, MA, USA.

Yiyi Ma (Y)

Department of Biomedical Genetics, Boston University School of Medicine, Boston, MA, USA.

Michael Schmidt (M)

John P. Hussman Institute for Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA.

Fangui Sun (F)

Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA.

Mariusz Butkiewicz (M)

Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA.

William S Bush (WS)

Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA.

Brian W Kunkle (BW)

John P. Hussman Institute for Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA.

John Malamon (J)

Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.

Najaf Amin (N)

Department of Epidemiology, Erasmus Medical Center, Rotterdam, the Netherlands.

Seung Hoan Choi (SH)

Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA.

Kara L Hamilton-Nelson (KL)

John P. Hussman Institute for Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA.

Sven J van der Lee (SJ)

Department of Epidemiology, Erasmus Medical Center, Rotterdam, the Netherlands.

Namrata Gupta (N)

Medical and Population Genetics Program, Broad Institute, Cambridge, MA, USA.

Daniel C Koboldt (DC)

Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA.

Mohamad Saad (M)

Department of Biostatistics, University of Washington, Seattle, WA, USA; Division of Medical Genetics, University of Washington, Seattle, WA, USA.

Bowen Wang (B)

Department of Statistics, University of Washington, Seattle, WA, USA.

Alejandro Q Nato (AQ)

Division of Medical Genetics, University of Washington, Seattle, WA, USA.

Harkirat K Sohi (HK)

Division of Medical Genetics, University of Washington, Seattle, WA, USA.

Amanda Kuzma (A)

Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.

Li-San Wang (LS)

Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.

L Adrienne Cupples (LA)

Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA; The Framingham Heart Study, Framingham, MA, USA.

Cornelia van Duijn (C)

Department of Epidemiology, Erasmus Medical Center, Rotterdam, the Netherlands.

Sudha Seshadri (S)

The Framingham Heart Study, Framingham, MA, USA; Department of Neurology, Boston University School of Medicine, Boston, MA, USA.

Gerard D Schellenberg (GD)

Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.

Eric Boerwinkle (E)

Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA; Human Genetics Center, University of Texas Health Science Center, Houston, TX, USA.

Joshua C Bis (JC)

Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA.

Josée Dupuis (J)

Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA; The Framingham Heart Study, Framingham, MA, USA.

William J Salerno (WJ)

Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.

Ellen M Wijsman (EM)

Department of Biostatistics, University of Washington, Seattle, WA, USA; Division of Medical Genetics, University of Washington, Seattle, WA, USA.

Eden R Martin (ER)

John P. Hussman Institute for Human Genetics, University of Miami Miller School of Medicine, Miami, FL, USA.

Anita L DeStefano (AL)

Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA; The Framingham Heart Study, Framingham, MA, USA; Department of Neurology, Boston University School of Medicine, Boston, MA, USA.

Articles similaires

[Redispensing of expensive oral anticancer medicines: a practical application].

Lisanne N van Merendonk, Kübra Akgöl, Bastiaan Nuijen
1.00
Humans Antineoplastic Agents Administration, Oral Drug Costs Counterfeit Drugs

Smoking Cessation and Incident Cardiovascular Disease.

Jun Hwan Cho, Seung Yong Shin, Hoseob Kim et al.
1.00
Humans Male Smoking Cessation Cardiovascular Diseases Female
Humans United States Aged Cross-Sectional Studies Medicare Part C
1.00
Humans Yoga Low Back Pain Female Male

Classifications MeSH